Moneyball with meta-classfiers!

Meta-analysis has been used widely in academic research. However, it has gained currency outside of the hallowed groves of academia after President Obama’s 2012 election victory when Nate Silver correctly predicted 50/50 states. More recently, in India, Chanakya has been making headlines for having correctly predicted the election results for the past few state elections including the national elections held last year. Both Silver and Chanakya use a version of what is known as meta-analysis.

Simplistically, meta-analysis is a statistical technique used to combine the findings from independent study. Wikipedia says that “In its simplest form, meta-analysis is done by identifying a common statistical measure that is shared between studies, such as effect size or p-value, and calculating a weighted average of that common measure.”

Let me illustrate the power of meta-analysis with a simple example of churn in the telecom industry. The database consisted of details of 5000 customers in the United States along 20 different dimensions that includes the location, plan type, details of morning, evening, night and international calls such as the call times, durations and total call charges as well as the number of customer service calls. Since the details of whether a customer churned or not is known, I can build a classifier to predict the probability of churn for any new customer.

The dataset was partitioned into training (3333 customers) and test (1667) customers in a random manner. The idea is to build the model on the training set and then test the efficacy of the model on the test set. I built three different models using Support Vector Machines (SVM), Conditional Forests (CF) and Ada boost (ADA). Do note that I didn’t really optimize or tune the models in any way. The results from the individual models are shown below:


In this particular case, it is more important to predict who is going to churn rather than who isn’t. Hence, while the CF classifier correctly predicts all those not going to churn, it underpredicts those who will churn.

Now, the simplest way to bring these models together is a simple majority voting scheme. Let’s say that we go with the answer where at least two of the models agree on the prediction. The results from our first “meta” classifier then become:


Well, it doesn’t seem to have done much better. So, let’s try a more complex model. What we can do, is take the output predictions on the three classifiers on the training set and use that to train a new classifier. Then, we can send the individual model predictions on the test set through this new model to get the final predictions. I used the SVM classifier to build my meta classifier. Here are the results:


Now we are getting somewhere! The number of those correctly predicted as churned has gone up dramatically! While this is just a quick and dirty example, I believe that it typifies what one can expect to see with meta classifiers that bring together the results from multiple independent analyss.

Now do you wonder why Nate Silver or Chanakya get it so right?;-)