Analyzing Airtel Scratch Cards

Chris Orwa
2 min readOct 30, 2015

One of my blog readers Johnson Kamotho sent me data of his Airtel scratch card collection and asked if I could perform a similar analysis to the Safaricom blog post. I was up to the ask and opted to apply a couple of techniques I’ve been trying out to reveal the underlying structure of data sets. The provided data had each digit separated as a variable which made me breathe a sigh of relief — no data preparation required!

Let the jargon begin. I opted to use PCA (Principal Component Analysis) given its good way of capturing varying data points. It is actually used in facial recognition to ignore varying aspects of an image such as lighting, color, e.t.c and represent a human face with the simplest points hence ease of comparison.

Principal Component Analysis

Principal Component Analysis (PCA) is a statistical technique used to measure variation in interdependent variables. This technique utilizes four major computations; first calculate the variance (standard deviation) of an observation — this gives how far a set of numbers is spread out. Second compute covariance which measure how two random variables change together — in essence comparing the variances of two fields (in our case the positions ). Third and of more importance is building an N x N matrix holding all the covariances (here 14 by 14 matrix ), then extract eigen vectors from the matrix (a subset of the matrix that doesn’t change even if more data is added).

Don’t get Lost

If you are lost at this point we are simply getting a representation of the data that remains consistent when more data is added — sort of getting the bone structure of our data. Good thing, you don’t have to perform all these steps, you can do this in one line of code in R programming language.

So, I performed a PCA analysis on the data and visualized it in 2 dimensions (first two principal components) as shown below.

Read full article here

--

--