asb: head /dev/brain > /dev/www

My home, musings, and wanderings on the world wide web.

Inspecting objects at point with ESS

Somewhere in the second half of last year I switched my primary text editor from Vim to Emacs. Calm down! I use Evil. So, one of my primary languages is R and I was in love with this plugin in Vim. A snippet of my vim config that I used to rely heavily on in Vim was:

1
2
3
4
5
6
7
8
9
map <LocalLeader>nr :call RAction("rownames")<CR>
map <LocalLeader>nc :call RAction("colnames")<CR>
map <LocalLeader>n2 :call RAction("names")<CR>
map <LocalLeader>nn :call RAction("dimnames")<CR>
map <LocalLeader>nd :call RAction("dim")<CR>
map <LocalLeader>nh :call RAction("head")<CR>
map <LocalLeader>nt :call RAction("tail")<CR>
map <LocalLeader>nl :call RAction("length")<CR>
map <LocalLeader>cc :call RAction("class")<CR>

These commands were invented by me after looking at a similar usage pattern in the plugin’s manuals. Being able to inspect objects at point without switching from my editor to the R prompt made me much more productive than when I could not do this. After I switched to Emacs and the mighty ESS for programming in R, replicating this was an explicit TODO in my ESS configuration. Ladies & gentlemen, today I bring you the solution. Drumroll!

Python's AST module: Bringing a gun to a knife fight!

So, I’ve been writing unit tests for some statistical code using py.test. One of the sweet things about py.test is that it gives you some cute context specific comparison assertions where you can check a data structure with another.

The problem that I ran into is when using this with floating point numbers. A minimal (convoluted) example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
>>> import pandas as pd

>>> pd.np.random.seed(3141)
>>> xx = pd.np.random.random(17)

>>> print pd.np.percentile(xx, 25)
0.386739093187

>>> assert 0.386739093187 == pd.np.percentile(xx, 25)
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-16-f6262184b4b6> in <module>()
----> 1 assert 0.386739093187 == pd.np.percentile(xx, 25)

AssertionError:

Boggle solver: Linear and recursive search

I was recently introduced to a Boggle solver problem by a friend of mine. Put simply, given a boggle board of sixteen characters and a dictionary, the program needs to figure out how many words from the dictionary are possible on the boggle board. The one deviation from standard boggle is that adjacent moves restriction is relaxed, i.e. order is unimportant.

My repository on Github discusses the problem and the two solutions implemented and their tradeoffs in much more detail along with presenting the code. Do visit.

`timeit` macro for SBCL

I am the sort of person who really likes to know how much time code I have written takes to run. I believe that it is important to know what works in your language and what does not and what sort of efficiency trade offs need to be made for expressiveness and brevity. Since, I am learning common lisp a little seriously again, I have been interested in seeing how to profile code in it. Even though I haven’t reached this far that I have started using the statistical CPU profiler or the statistical allocation profiler, I am starting out with simply being able to time code I write quickly.

Since, writing macros in lisp is so easy and timing code is just the sort of thing macros are really useful for, I decided to practice some macro writing to write a timeit macro. Similar attempts have been made in the past but I wanted to roll my own. After some struggle and a little nudge I was able to write something satisfactory:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
(defmacro timeit ((&key
                    (to-stream *standard-output*)
                    (with-runs 1))
                  &body body)
  "Note that this function may barf if you are depending on a single evaluation
  and choose with-runs to be greater than one. But I guess that will be the
  caller's mistake instead."
  (let ((start-time (gensym))
        (stop-time (gensym))
        (temp (gensym))
        (retval (gensym))
        (elapsed-time (gensym)))
    `(let* ((,start-time (get-internal-run-time))
            (,retval (let ((,temp))
                       (dotimes (i ,with-runs ,temp)
                         (setf ,temp ,@body))))
            (,stop-time (get-internal-run-time))
            (,elapsed-time (/ (- ,stop-time ,start-time)
                              internal-time-units-per-second)))
       (format ,to-stream
               (concatenate 'string
                            "~CAverage (total) time spent in expression"
                            " over ~:d iterations: ~f (~f) seconds.~C")
               #\linefeed
               ,with-runs
               ,elapsed-time
               (/ ,elapsed-time ,with-runs)
               #\linefeed)
       ,retval)))

Hopefully, this is useful for someone not in the mood to write their own code to do this. This was definitely useful for me to write.

R: Converting simple_triplet_matrix to other sparse matrix formats.

When working with the tm package in R, it produces a DocumentTermMatrix or TermDocumentMatrix as an S3 object of class simple_triplet_matrix from the package slam. Now R has various different packages for creating sparse matrices, each of which have packages depending upon themselves. Long back when working with text data I had created two functions to convert a simple_triplet_matrix to a matrix of class sparseMatrix (package: Matrix) and that to a matrix of class matrix.csr (package: SparseM). The conversions are simple enough once you have read through the documentation of the three packages but hopefully this will save someone some time.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
asSparseMatrix = function (simpleTripletMatrix) {
  retVal = sparseMatrix(i=simpleTripletMatrix[["i"]],
                        j=simpleTripletMatrix[["j"]],
                        x=simpleTripletMatrix[["v"]],
                        dims=c(simpleTripletMatrix[["nrow"]],
                               simpleTripletMatrix[["ncol"]]))
  if (!is.null(simpleTripletMatrix[["dimnames"]]))
    dimnames(retVal) = simpleTripletMatrix[["dimnames"]]
  return(retVal)
}

asMatrixCSR = function (sparseMatrix) {
  warning("Will lose dimnames!")
  as.matrix.csr(new("matrix.csc",
                    ra=sparseMatrix@x,
                    ja=sparseMatrix@i + 1L,
                    ia=sparseMatrix@p + 1L,
                    dimension=sparseMatrix@Dim))
}