My last message, titled "I think Biden will win", was posted on Monday, at midday. It included the polls published before 11AM. From 11AM Monday to Tuesday morning, 16 new polls were published on different sites. These polls confimed that support for Trump was increasing. Therefore, I tweeted on Tuesday that I still thought that Biden would win but that the results would likely be closer than expected.
Now that most of the votes are counted, we can compare the polls and the vote. The counted vote gives 50.4% to Biden and 47.9% to Trump, as I am writing this post. Please note that the proportion of votes for Biden is likely to increase with the counting of the votes and therefore, the performance of the pollsters will eventually appear better than now. In all the graphs, I used the proportional support for each candidate (support for one candidate over the sum of the two) in order to compare likes with likes: Some pollsters do not publish estimates of support for small candidates and even the proportion of undecided. Therefore, the published figures are not comparable. Therefore, I use the following figures for the reported vote: 51.3% for Biden and 48.7% for Trump. This is the proportion of the vote for each over the sum of the vote for the two of them; it allows for comparing likes with likes.
The first graph shows the trends in support for the two candidates when I integrate the 223 polls published from September 1st. As we can see, collectively, the polls missed the target, even if we take into account the late increase in support for Trump. However, as we can also see by the dots what are close to the election results, some polls seem to have performed better than others.
What about modes?
The following graph shows the estimates according to the main mode of administration used by each pollster. As we can see, not all the polls were wrong. The two upper lines portray the trends traced by IVR polls and the dotted middle line, the trends traced by live interviewer polls. The trends tend to show that these polls were generally more accurate, and within the margin of error. These polls account for only 20% of the polls overall (14% live interviewer and 6% IVR) but they account for almost all the polls that performed better. Not all the telephone polls performed well and not all the web polls performed badly. However, there is a rather strong relationship between mode of administration and estimates of support. In addition, the telephone polls portrayed changes in support that were missed by the web polls. Therefore, a global average of the polls that do not take mode into account is misleading.
Since IVR pollsters cannot call cell phones, they complete their samples using another mode, usually web among cell only users. Other pollsters also combine modes, usually live telephone and web. In the preceding graph, we attributed the mode according to the main mode used. In the following graph, we show the trends according to the use of mixed mode. As the graph shows, there is indeed a tendency for polls using mixed modes -- the upper dotted line -- to be more accurate.
The proportion of web polls went from 50% in 2016 to 80% now. The use of averages that do not take modes into account -- by aggregators for example -- contributed to the impression that all the polls had been wrong. I do not know how and if it can be fixed in the next election but I think it ought to.
The fact that more pollsters resorted to mix modes is good news in my view. It seems difficult to represent the whole population of the United States using only one mode. Web modes cannot reach close to 15% of the population, which does not have internet access. IVR polls cannot reach cell only phone users withour resorting to another mode. The live interviewer polls can reach almost everybody but they are expensive.
What are the factors that explain differences between modes? It is difficult to partial out the impact of the cluster of methods that come with each mode and, sometimes, with each pollster. Modes vary in their sampling source, in their coverage and in their weighting. They also differ in terms of the length of the questionnaires they use, the way they ask their questions, the order of the vote intention question, etc. It remains that the modes that resort to probibilistic or quasi-probabilistic recruitment tended to perform better. The reports show that some web pollsters are trying to improve their methods to integrate some randomness but the estimates for this election show that there is still work to do. The AAPOR committee that will examine the poll performance will no doubt look at all these features.
Most of the methodology reports provided by pollsters were good enough to get sufficient information on what they do, which is good news. There are seven pollsters for which it has been impossible to find any methodological report. Some of them are academic pollsters or work for established media. This is difficult to understand.
Anybody outside the U.S. would be very surprised, I believe, by the proliferation of pollsters. There are 54 pollsters who published polls since September 1st. Among them, 21 conducted only one poll and eight conducted two polls. Five of them appeared in the last week of the campaign. I checked and did not find any difference between their estimates and the estimates of more established pollsters. However, it is more difficult to find information about some of these pollsters and to check on their work. It may become a concern when electoral campaigns are heated. Some may be tempted to publish biased or fake polls because they think it may help their preferred candidate.
We thank Luis Patricio Pena Ibarra for his meticulous work in entering the data and gathering the related information to help me conduct these analyses.
Methodology for the analyses presented in this post.
1) The trends produced are produced by a local regression procedure (Loess). This procedure gives more weigth to estimates that are closer to each other and less to outliers. It is however rather dependent on where the series start. For example, I started this series with the polls conducted after August 1st. This means that all the polls conducted since then influence the trends. If I start on September 1st, the trends are somewhat different because of the absence of the influence of the polls conducted in August. I try to balance the need to have enough polls for analysis and the importance of not having old information influence the trends too much.
2) Every poll is positioned on the middle of its fieldwork, not at the end of it. This seems more appropriate since the information has been gathered over a period of time and reflects the variation in opinion during that period.
3) All the tracking polls are included only once for each period. A poll conducted on three days is included once every three days. In this way, we only include independent data. This is appropriate in statistical terms.
4) The data used comes from the answers to the question about voting intention.
5) For non-Americans, note that, in the USA, IVR ("robopolls") cannot be used to call cell phones.U.S.A. seems to be the only country that has this restriction.