Ah! les sondages: I think Biden will win

Hi,

I rarely commit myself to predict election results. This time, I am quite confident. And I know I will feel very bad on the 4th -- or when the final results are known -- if I am wrong... Of course, I examined only the national polls but if Biden ends with less electoral votes and a larger proportion of the popular vote than Trump, it would mean that the system really needs to be changed.

I first present the graphs and then, I explain why I think Biden will win. I included all the polls published before 11 this morning. Here is the first graph with all the polls. It shows that there might be a sligth decline in support for Biden recently. If this decline is confirmed however, he would end with the same support as in the beginning of September. As we will see in the next graph, not all the modes estimate that there is such a decline. The difference between the support for Biden over Trump is still at about eight points. Just before the election in 2016, the difference between support for Clinton and for Trump was closer to five points, using the same methodology.

Let us examine now the graph by mode of administration. We still observe a difference in trends by mode. However, with the integration of the recent polls, the difference is not as large. Live telephone and IVR polls see a recent increase in support for Trump. There are not that many IVR polls so that any new poll may modify the trends.

In summary, according to the web polls (close to 80% of the polls), proportional support for Trump has stabilized at around 45% since mid-October. Some of these polls have higher estimates, at around 48%, others, lower estimates, at 42%. According to live interviewer polls (15% of the polls), support for Trump has increased recently, at around 47%. The only polls that estimate the difference in support within the margin of error are the IVR polls (6% of the polls), and they do not estimate that this support is still increasing.

Why I think Biden is likely to win

First, the AAPOR report on the 2016 polls has shown that most of the 2016 national polls conducted during the last week before the election in 2016 were within the margin of error. It identified the lack of weighting by education in the state polls as a possible explanation for their not so good performance. The pollsters seem to have adjusted for that. In addition, some pollsters weight for addtional variables like household size, the presence of children, the vote cast at the last elction. The committe concluded that a lower turnout among Clinton supporters was a likely explanation for the difference between the polls and the vote in some states. This does not seem to be happening in this election. The turnout will likely be much higher than in 2016.

Second, a polling miss can come from two main factors: the poll methodology and people changing their minds (or not revealing their preference). In the 1980's and 1990's, the pollsters used very similar methodologies. Therefore, methodological factors could be a likely explanation for polling misses. Right now, the methods used are so diverse that it is unlikely that methods would lead to a polling miss. We can also exclude a last minute shift -- unless something incredible happens after I finish writing this post. Even then, so many people have already voted and cannot change their vote that a major shift is almost impossible. A last minute shift in voting intention has very rarely been proven. Sturgis -- Report of the British Polling Council -- checked for a late campaign shift to explain the poll failure in the UK in 2015 and concluded that no such movement occurred. The only case I know of is Quebec 2018 polling miss where there was a substantial move from one of the two major parties to the other during the last days of the campaign (Durand and Blais, 2020).

Now, there is one question left. How come IVR polls show such a difference with other polls. Could they be right and all the others wrong? IVR polls have often performed very well. However, in all these cases, their estimates were not far from the estimates produced by the other modes. In this election, after closing the gap with other polls at mid-campaign, they end with estimates that are four points higher than web polls and more than two points higher than live interviewer polls. There are unfortunately not enough polls using mainly this mode to see if it is due to the pollster or to the mode itself.

In summary, since September 1st, we identified 207 national polls conducted by 54 different pollsters with various methodologies. Only 12 of these polls are mainly IVR polls, 30, mainly Live interviewer polls and 16% of the polls combine different modes. It appears to me very unlikely that the large majority of these polls and pollsters are wrong. My only caveat is that we need to keep in mind that web polls seem to have difficulty detecting change.

Conclusion

For Donald Trump to win, it would be necessary that more than 50 different pollsters with different methodologies be wrong. This is highly unlikely. Electoral polls are often the best polls that you can get because pollsters know that their estimates will be compared with the final results (and with their competitors). Their performance testify of their competence. They are therefore very careful and they try to improve their methods all the time. However, there are many "one-shot" pollsters in this election (four new ones last week). I checked on their estimates. They are similar to those of established pollsters. Finally, my precedent blog showed that, even when we "load" the estimates in favor of Trump to simulate a "shy Trump" effect, we are still led to conclude that Biden is sufficiently ahead to win.

We thank Luis Patricio Pena Ibarra for his meticulous work in entering the data and gathering the related information to help me conduct these analyses.

Methodology for the analyses presented in this post.

1) The trends produced are produced by a local regression procedure (Loess). This procedure gives more weigth to estimates that are closer to each other and less to outliers. It is however rather dependent on where the series start. For example, I started this series with the polls conducted after August 1st. This means that all the polls conducted since then influence the trends. If I start on September 1st, the trends are somewhat different because of the absence of the influence of the polls conducted in August. I try to balance the need to have enough polls for analysis and the importance of not having old information influence the trends too much.

2) Every poll is positioned on the middle of its fieldwork, not at the end of it. This seems more appropriate since the information has been gathered over a period of time and reflects the variation in opinion during that period.

3) All the tracking polls are included only once for each period. A poll conducted on three days is included once every three days. In this way, we only include independent data. This is appropriate in statistical terms.

4) The data used comes from the answers to the question about voting intention.

5) For non-Americans, note that, in the USA, IVR ("robopolls") cannot be used to call cell phones.U.S.A. seems to be the only country that has this restriction.

Ah! les sondages

Qu'est-ce qu'une marge d'erreur, comment la calculer?

lundi 2 novembre 2020

I think Biden will win

Aucun commentaire:

Publier un commentaire