Since I discovered it, I have found logical subsetting in R a very elegant idiom. Recently I’ve had a change of heart due to two observations. Here is why I propose staying away from it.
First of all, there is this question on stackoverflow which proves that subsetting numerically using which
is faster than logical subsetting.
I know this may not be a good enough reason for some of you. It wasn’t for me either till I found this bug.1 It’s a subtle, and therefore, the sinister kind. See the code snippet below to reproduce the bug.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
|
And you are left wondering where you called summary
in your code.
Even if one discounts efficiency, it may be helpful to give up on the idiom of logical subsetting for reasons of correctness. At least for data.frame
s.
-
Or at least I’ll consider this a bug. ↩