My home, musings, and wanderings on the world wide web.
R: Recreating the history of a stock index's membership
If you work in finance, have you ever needed to identify the stocks that were in an index at any given date? Or perhaps a series of dates? Given how central indices are to financial markets and how frequently they change, this is a common problem.
The rub is that this common problem seems too trivial till you sit down to solve it. Hopefully, this post will save you a couple of hours the next time you come across this problem (and if you are a user of R).
Overgeneralize: Time series of membership of a set
In this post I am going to present a general method to recreate a time series of membership of a set using the current membership and a list of changes in it.
There are a few subtle points to it that we’ll discuss shortly. But first let’s start with a very simple example.
# Let's have three time periods.periods <-3# And name the change times as `t-1', `t-2', and `t-3'.ctime <-paste0("t-",seq_len(periods))# Define current membership of a set of alphabets.current <-letters[1:5]# Let's assume that "a" was added to the set at `t-1' and so on...added <- setNames(c("a","b","c"), ctime)added
# Similarly the ones that were removed:removed <- setNames(c("f","g","h"), ctime)removed
# Let's use the power of `wishful thinking' and hope we have something# which could give us this:create_mem_ts(ctime, added, removed, current)$current
[1]"a""b""c""d""e"$`t-1`[1]"b""c""d""e""f"$`t-2`[1]"c""d""e""f""g"$`t-3`[1]"d""e""f""g""h"attr(,"index")[1]"t-1""t-2""t-3"
create_mem_ts <-function(ctime, added, removed, current){stopifnot(is.atomic(ctime),is.atomic(added)||is.list(added),is.atomic(removed)||is.list(removed))if(any(is.na(ctime)))stop("NAs not allowed in ctime.")stopifnot(length(ctime)==length(added),length(added)==length(removed))if(any(duplicated(ctime))){ ctime.u <-unique(ctime) ctime.f <-factor(ctime, levels=as.character(ctime.u)) added <-split(added, ctime.f) removed <-split(removed, ctime.f)}else{ ctime.u <- ctime
} out <- setNames(vector(mode="list", length=length(ctime.u)+1),c("current",as.character(ctime.u))) out[["current"]]<- current
for(i in2:length(out)) out[[i]]<-union(setdiff(out[[i -1]], added[[i -1]]), na.omit(removed[[i -1]]))attr(out,"index")<- ctime.u
out
}
The function is much more powerful than the simple example quoted above. Here is the gist including this code, documentation and a few examples.
Contextualize: Time series of stock index membership
Let’s take the example of the Nifty index on the National Stock Exchange of India.1 Nifty is the most important Indian stock index comprised of fifty stocks. The membership is typically shuffled twice a year. The following is the list of NSE symbols for the current members of Nifty:
There are two important differences between this example and the last:
ctime is now a true time class (POSIXt) in R.
There may be more than one record for a given ctime.
The function create_mem_ts has been designed to handle such changes (and others) seamlessly. In fact, having a true time class lends a benefit we’ll soon observe.
For now, let’s see how our function performs in this case:
Why another function? Why not just create the whole series at once?
Because it is smart! Essentially, think of this as applying a set of diffs/patches to a text file.
It needs less space to store.
It provides the user the flexibility and efficiency (both in time and space) by computing only for the dates one asks for.
It is a much cleaner abstraction (think initalizing and querying a database) that can work with any time-series class (or even user-written classes provided they define a greater than> method for their class).
All right! Smart, shmart, what do I do with it?
And finally, the answer!
Use a loop or an apply variant to find out the membership on the dates that you want. Suppose I want the membership of Nifty on a weekly basis starting from the January 1, 2012 to March 31, 2012.
I first create a sequence of desired dates like so:
I leave the proof of correctness as an exercise for the reader. No, seriously! Let me know if you find a bug in the comments below. Feel free to clone the gist to tinker and let me know if you improve it.
The forementioned gist contains a copy of the data used here. One may replicate this example by cloning the git repository. ↩