asb: head /dev/brain > /dev/www

My home, musings, and wanderings on the world wide web.

Knuth-Fisher-Yates shuffling algorithm

So, yesterday, I was sent this article by a friend. Articles titled X every developer should know sort of titles pique my interest very much. So, I headed over to this article only to find myself too confused. What I found more confusing is that the article links to one of Jeff Atwood’s blogposts that discusses the same concept.

The central idea of the post is to correct a naive implementation of a shuffling algorithm. My confusion arose from the fact that after reading the article, I could only think that the algorithm was blatantly incorrect and not naive. The differences between the two algorithms – the naive one and the one ascribed to Knuth, Fisher, & Yates is what, in sampling 101, is between sampling with replacement and sampling without replacement. This quote essentially highlights the naivete indeed:

How do we know that that above algorithm is biased? On the surface it seems reasonable, and certainly does some shuffling of the items.

Some shuffling does not mean a random permutation at all. This is akin to saying a naive implementation of a queue sometimes inserts incoming items at the head of the queue.

Another shuffling algorithm?

On the other hand, what was fruitful was that the article reminded me of a shuffling algorithm I recently wrote to solve problem number 25 of 99 Lisp Problems. The minor difference between the two algorithms is that the lisp algorithm pops an element from the original array per turn and collects it in a new array which becomes a random permutation (shuffle) of the original array at the end. The KFY algorithm instead swaps two elements at each turn. It can be worked out that, mathematically, the two algorithms are identical. But the KFY algorithm saves you making a copy. Therefore, yes, KFY is an algorithm that every programmer should know. Additionally, every programmer should also know the standard library of their language.