Hi,
In this post, after presenting the usual graphs of the likely change in voting intentions since the beginning of September, I will focus on the "undecideds" and show what happens if I proceed to an allocation of the undecideds that is different from what is usually done.
Pour les francophones, notez que le billet est en anglais mais que des textes et des entrevues en français sont prévus au cours des prochains jours. Voici le texte paru dans La Presse: http://plus.lapresse.ca/screens/d6ab6a79-5aa8-40c3-b840-731fdc3e9246%7C_0.html
First, as I explained in the preceding post, the method I use is dependent on the period that is included in the analysis. Since there was movement in the polls recently, I dropped the polls conducted in August to ensure that the estimation will be sufficiently driven by recent estimations. The graphs and the analyses are now performed starting September 1st. I have 141 polls in the data base for that period (polls published until yesterday November 3). I integrate the tracking polls only once in their respective fieldwork period and I do not include the LA Times polls for a number of reasons (see methodology at the end).
The first graph traces the support for Clinton, Trump and others, excluding the Undecideds. This is the equivalent of a proportional attribution of undecideds and this is what all pollsters seem to do. The graph shows that Clinton is ahead of Trump. However, support for Clinton appears to be stable since the second debate while the support for Trump has increased, at the expense of the support for the other candidates, mostly Johnson. This analysis shows support for Clinton at close to 49% and Trump's at close to 45%. All the recent polls have Clinton ahead of Trump. The race is tightening, according to the polls, because the decrease in support for others is helping only Trump. Let us notice that support for Clinton is now higher than it was 
at the beginning of September. This is validated by the regression analyses 
that I performed. 
If I analyze only the support for the two main candidates, we see that the proportion of support for Clinton first increased and then decreased in October so that her support is now somewhat higher than 52%, slightly more than at the beginning of September. There is no poll showing Trump ahead of Clinton in the last two weeks. 
Finally, there is still a significant difference between modes of administration. The green line shows that polls using an IVR/online methodology, Rasmussen mostly, give a systematically lower estimate of support to Clinton.
On this question, I performed a series of regression analyses, controlling for time and time squared. They show that, after control for change over time,  both Web polls and Live phone polls give on average around 2 points more to Clinton. However, I remembered that in previous elections, there were questions asked about tracking polls. So I also wanted to check also for a difference between tracking polls and other polls. The conclusion: All else equal, tracking polls estimate support for Clinton 0.8 point lower than other polls.
Now, what about the "Undecideds"?
The current situation is that "nobody cares" about the undecideds and all the pollsters and the analysts attribute them proportionally tot he candidate. Mark Blumenthal presented an analysis of the Undecideds and Uncertain in the campaign here. It shows that undecideds tend to be slightly more republican or lean republican and less likely to approve of Obama's job performance. However, this analysis is performed on one series of polls conducted with one methodology. An analysis of all the polls shows that the proportion of undecideds has decreased a little over time -- it went from close to 8% at the beginning of September to around 5% last week. However the most important information is that the proportion of undecideds varies by mode of administration. Since the beginning of September, it was on average 8.2% for web polls, 5.9% for IVR/online polls and 4.4% for Live Phone polls. It also varies substantially between pollsters: From 2% (AG_GFK) to 13% (Zogby) for web polls, from 5.3% (Rasmussen) to 11% (Survey USA) for IVR/Online polls and from 1.3% (CNN) to 12% (Princeton Surveys) for Live Phone polls. This means that the proportion of Undecideds is a methodological feature more than a "real" proportion of likely voters who do not know whom they are going to vote for.
In previous elections or referendums, I used a non-proportional attribution of non-disclosers (including undecideds and respondents stating that they would not vote). I could rely on what had happened in Quebec in the 1995 referendum and in most elections (see Durand, Blais and Vachon (2001) in POQ about the Quebec election of 1998 ). In the Scottish referendum and in the Brexit referendum, I showed that attributing 67% of the non-disclosers to the Conservative side gave a better forecast than a proportional attribution. Note that this does not mean that I hypothesize that 67% of the non-disclosers are on the conservative side. I use this procedure as a simple and empirically validated way to compensate for differences between methodologies (house effects) and for a possible underrepresentation of the more conservatives respondents. These respondents may be less likely to be part of the samples, less likely to cooperate with pollsters and less likely to reveal their vote. This is coherent with the "spiral of Silence hypothesis" put forward by Elizabeth Noelle-Neuman long time ago.
Without any cue on the best allocation in the USA case, I opted for consistence and gave 67% of the non-disclosers to Trump, 33% to Clinton and none to the other candidates (it is well known that small candidates are almost never underestimated by the polls). The next graph shows what it means in terms of estimation. I want to stress that it would be the most pessimistic scenario for Clinton's supporters. Clinton still appears to be ahead of Trump, but only by 2 points. 
Conclusion
This election will be very interesting in terms of analysis of electoral polls. We tend to think that the more "toxic" the climate in which an election occurs, the more likely a spiral of silence will occur in which some respondents with specific characteristics will not participate in polls or will not reveal their preference. It is not clear that we have this situation here. Are Trump supporters less likely to reveal their vote? Perhaps not but we may think that they are less likely to cooperate with an "institution" like pre-election polls. Participation in the election may also vary. And we should not forget that one mode of administration is pushing down the estimates of Clinton support.
Acknowledgements: Luis Pena Ibarra is responsible for validating and entering the data, conducting the analyses that produce the graphs and editing the graphs.
Methodology for this analysis.
1)
 The estimation presented is not an average, weighted or not. It is 
produced by a local regression (Loess). This analysis gives more 
importance to data that is close and less to outliers. It is however 
rather dependent on where the series start. I started this 
series with the polls conducted after September 1st. This means that all 
the polls conducted since then influence the trend. I try to balance the need to have enough polls for analysis and 
the importance of not having old information influence the trend too 
much.
2) Every poll is positioned on the middle of its 
fieldwork, not at the end of it. This seems more appropriate since the 
information has been gathered over a period of time and reflects the 
variation in opinion during that period.
3)
 The data used comes from the answer to the question about voting 
intention for the four candidates.
4) All the 
tracking polls are included only once for each period: For example, a poll conducted 
on three days is included once every three days. In this way, we only 
include independent data, which is appropriate in statistical terms.
5) I do not include the LA Times polls, mostly for two reasons. First, there is only one sample interviewed, always the same. If this sample is biased, all the polls are therefore biased. Second, the question used ask respondents to rate the probability that they will vote for each candidate. It is well known that these probabilities do not usually add to 100% unless it is "forced". We may also think that a proportion of these so-called probabilities are around 100 or 0. We do not have the distribution of the these probabilities, only their average. In my view, there is not enough information to integrate this poll, the question asked is too much different from other polls to be compared with them and the sample is akin to a sample of professional respondents, which is problematic.
 6) For Canadians,
 note that, in the USA, IVR cannot be used to call cell phones. This is 
why pollsters use Web opt-in for a part of their sample (20% in the case
 of Rasmussen). 




 
 
Aucun commentaire:
Publier un commentaire