Selection Bias on Twitter Polls

The Case of Mutahi Ngunyi Kenyan Election Polls

Chris Orwa
3 min readJun 21, 2022
(Pythia: Oracle and High Priestess of Delphi)

Pythia, the high priestess at the Temple of Apollo in Delphi answered questions about the future to people all over Greece and beyond. Themistocles, an Athenian general, received a prediction of doom on the impending invasion by King Xerxes of Persia. On persuasion for a redo, the Oracle reversed the prediction and gave the Athenians a way to escape their demise.

A Themistocles moment is certainly facing the Kenyan electorate and the Oracle at Delphi is certainly the political scientist Mutahi Ngunyi. In the presidential race, Mr. Ngunyi’s forecast for the winning candidate shifted from the Deputy President to the former Prime Minister. While the priestesses relied on hallucinogenic vapour from the Oleander flower to observe the future, Mr. Ngunyi seems to depend upon monthly administered Twitter Polls to his 1.9 million followers to draw conclusions (see chart below).

If no one knew anything about who is most likely to become Kenya’s next president, then the odds of each of the two leading candidates would be 50%. On the other hand, if the outcome of the ballot was certain, the odds would be predictable and become less variable over the remaining period to the election. However, when the prediction fluctuates significantly then either the election is doctored like the Oracle at the Delphi who were happy to give another prophecy when more gold was provided or poor sampling leads to the volatility.

Mirroring the Polling Station

It’s impractical to poll an entire population that is why pollsters select a sample of individuals that represents the whole population. Understanding how respondents come to be selected to be in a poll is a big step toward determining how well their views and opinions mirror those of the voting population. Polling companies choose from a wide variety of options divided into two types: those that are based on probability sampling methods and those based on non-probability sampling techniques.

For more than five decades probability sampling was the standard method for polls. But in recent years, as fewer people respond to polls and the costs of polls have gone up, pollsters have turned to non-probability based sampling methods like collecting data from online forums such as Twitter where people volunteer answers. Journalists and the public need to understand the strengths and weaknesses of both sampling techniques to effectively evaluate the quality of a survey, particularly election polls.

In a probability sample, all persons in the target population have a chance of being selected for the survey sample. The major advantage of a probability-based sampling is that we can calculate how well the findings from the sample represent the total population. Non-probability sampling methods do not share this feature that everyone in a population has a chance of being selected. Participants are typically not selected at random to be included in the sample but rather come to be included by other means, for instance because they volunteer.

The Ngunyi Audience

Followers of Mutahi Ngunyi are overwhelmingly male and young. A sample of 75,000 of latest followers shows 78 percent are male and less than 70 days old on Twitter (see diagram below). It is this demographic that’s likely causing the swing in the polls conducted every month by Mutahi Ngunyi on Twitter. The selection of this sample is non-probabilistic and creates a selection bias by focusing on young males, hence not representative of the voting population in Kenya.

As of January 2022, Twitter was more likely to be used by men. Overall, 43.6 percent of Twitter users were female and the remaining 56.4 percent were male. Any deployment of a poll on Twitter will automatically create a gender bias in addition to an age bias as the platform is more popular with the younger generation including 13–17 year old who don’t vote.

The Pythia offered practical counsel that could shape future actions, just as we do today — though we’d use modern jargon and call it science.

--

--