Ah! les sondages: Could we have forecasted accurately?

Hi everybody,

Pour les francophones, notez que le billet est en anglais mais que des textes et des entrevues en français sont prévus au cours des prochains jours. Voici le lien au texte paru dans La Presse: Les méthodes de sondage et leurs limites et voici le lien à l'entrevue avec Anne-Marie Dussault (24/60) lundi soir dernier présentant mes prédictions (1er graphique de ce blog Entrevue avec Anne-Marie Dussault 24/60 lundi 7 novembre 2016

I will start by bragging a bit... Here is the graph that I presented in a TV program on Monday night (in French) which means that I have proof of this (mind you):Interview with AMDussault, RDI, November 7, 2016. This graph shows voting intentions when I attribute 67% of the undecideds in each poll to Trump and 33% to Clinton. It forecasts Clinton ahead of Trump by one point, an almost perfect forecast (the two are one point too low compared with Others).

Why perform this non-proportional distribution of non-disclosers' preferences? I explained it elsewhere. It gave a perfect forecast of the Brexit results when I attributed the bulk of undecideds on the appropriate side, i.e. the Yes side in that case. However, this time, I also have to thank Brian Breguet from the site "Too close to call" and Tamas Bodor, from the University of Wisconsin who sent me an email saying "this is the perfect storm for a spiral of silence". He wrote about it in IJPOR: The Issue of Timing and Opinion Congruity in Spiral of Silence Research: Why Does Research Suggest Limited Empirical Support for the Theory? I borrowed the term "toxic climate" from him if I remember well.

Now what? Other intriguing figures that should have us tick?
In the preceding graph, we see that every point lost by "others" is taken by Trump. Others lost a third of their support, from 12% on September 1st to 8% on election day, according to the polls. They finally got 5% of the vote, the remaining three points going mostly to Trump.

What about the modes? Two remarks about the modes. In the following graph, you see that while the telephone polls -- with or without an interviewer -- show a "bump" in support for Clinton in October during the period where the debates were held, there is no such bump measured by the Internet polls. How come? We do not know the answer yet but we will have to look into that and see if is a recurrent phenomenon. It may be that internet polls that rely on panels use samples that are more homogeneous. It may mean that there were no real bump in support for Clinton.

Finally, the IVR/web online polls tended to show Trump ahead almost all the way. No herding here. They stand by their numbers. My conclusion was that they were outliers. What if, in fact, they had Trump higher because they phone only landline phones and those make up for 80% of their sample. This is coherent with the fact that Trump was quite strong in rural areas, where cell phones are not used as much as in cities. Perhaps there are too many cell phones proportionally in the samples. And since most people who have cell phones also have landline, they have more chance to be contacted. In short, urban people have more change to be selected.

Where have all the margins of errors gone?

I have 21 polls in my data base during the last week -- I integrated the tracking polls only once during their field period and I did not integrate the LA Times polls. If you compute the margin of error for the difference between two proportions, you will realize that 18 of these 21 polls were within the margin of error for such a difference in proportions. This means that every time these polls were published, there should have been a mention -- RED ALERT! -- by the pollster and/or the media stating "According to the margin of error -- or credibility interval -- there is no difference between the two candidates". Right now, the information is there but nobody speaks about what it means. The aggregators and analysts may say a very substantial majority of the polls had Clinton winning but most individual polls showed a tie. It this had been stated loud and clear, the population would have been accurately informed.

What about the likely voter models? It is a black box. Happily, some pollsters publish two estimates, one for registered voters, another for likely voters, that will allow comparison. Perhaps we should always ask for these two estimates in order to be able to analyze the impact of using different kinds of models. The actual situation reminds me of France presidential election of 2002 where each pollster had its own recipe (but almost all the recipes gave exactly the same estimate!).

Finally, what about last minute changes?

This is usually "the explanation" put forward by the pollsters. However, in this case, it may be true that there were some last minute change where voting intentions for the other small candidates went to Trump. This even more likely since the majority of this support was for Johnson.

In conclusion

I think we had some means to forecast what was happening. The problem is that it is easy to say afterwards. Anyhow, we need polling errors to improve, though it is not fun at all. The AAPOR adhoc Comittee will certainly have the cooperation of all pollsters in its quest to understand what happened and, hopefully, recommend improvements. One of the questions however is the work of aggregators and analysts. With the type of analyses that we use, can we easily point to last minute changes?

Acknowledgements: Luis Pena Ibarra is responsible for validating and entering the data, conducting the analyses that produce the graphs and editing the graphs.

Methodology for this analysis.

1) Non-proportional attribution of preferences to non-disclosers was used in Quebec in the 1980s and 1990s. It was proposed by Maurice Pinard from MacGille University and validated by Pierre Drouilly from UQAM. It was used by pollsters at that time to compensate for the fact that the PLQ, a center-right party, was always underestimated by the polls. In the 1995 referendum on sovereignty, 75% of non-disclosers were attributed to the No side and it gave a perfect prediction (50.5-49.5).

2) The estimation presented is not an average, weighted or not. It is produced by a local regression (Loess). This analysis gives more importance to data that are close and less to outliers. It is rather dependent on where the series start. I started this series with the polls conducted after September 1st. This means that all the polls conducted since then influence the trend. I try to balance the need to have enough polls for analysis and the importance of not having old information influence the trend too much.

3) Every poll is positioned on the middle of its fieldwork, not at the end of it. This seems more appropriate since the information has been gathered over a period of time and reflects the variation in opinion during that period.

4) The data used come from the answers to the question about voting intention for the four candidates (and others).

5) All the tracking polls are included only once for each period: For example, a poll conducted on three days is included once every three days. In this way, we only include independent data, which is appropriate in statistical terms.

6) I do not include the LA Times polls, mostly for two reasons. First, there is only one sample interviewed, always the same. If this sample is biased, all the polls are therefore biased. Second, the question used ask respondents to rate the probability that they will vote for each candidate. It is well known that these probabilities do not usually add up to 100% unless it is "forced". We may also think that a proportion of these probabilities are around 100 or 0. We do not have the distribution of the these probabilities, only their average. In my view, there is not enough information to integrate this poll, the question asked is too much different from other polls to be compared with them and the sample is akin to a sample of professional respondents, which is problematic.

7) For Canadians, note that, in the USA, IVR cannot be used to call cell phones. This is why pollsters use Web opt-in for a part of their sample (20% in the case of Rasmussen).

Ah! les sondages

Translate

jeudi 10 novembre 2016

Could we have forecasted accurately?

1 commentaire: