In 2016, just before the election, I tested the hypothesis of a "Shy Trump" effect. I personnally prefer to speak about a possible underestimation of the support for Trump since I think that it is possible that different effects are cumulative: a possible under-representation of Trump supporters in the samples, a possible lower tendency of Trump supporters to answer the polls when they are contacted, and a tendency not to reveal their preference. These effects may not play in the same way and at a similar level for different modes of administration.
In order to simulate a "Shy Trump" effect, I proceed in the same way as I did in 2016. This procedure led me to forecast that the two candidates were quite close than we thought at that time and that Trump could win. I computed the sum of all the undecideds, abstainers and those who say that they support other candidates (a very small proportion). I add these three components because it is easy to figure that "shy respondents" may as well say that they will not vote or that they will vote for another candidate.
This sum varies from 0 to 23%. It is at 7.5% on
average, at 5.3% for IVR polls, 5.9% for live interviews and 8% for web
polls. The fact that it is higher for web polls may say something about the plausibility of a shy Trump effect but it could also be due to the composition of the web samples.
67% of that sum -- that we will call the non-disclosers -- to Trump and 33% to Biden. Clearly, the hypothesis is heavily "loaded" for Trump. This would probably be the worst situation.
Now here is the graph that I get after proceeding to that allocation. We now have support for Biden at around 53% and for Trump at 47%. We still have a lead of six points for Biden. We see however that some polls have Trump leading over Biden. Let us look at these estimations by mode.
The next graph shows the estimates of support for the two candidates, with the same non-proportional allocation. As we can see, the web polls estimate the support for Trump at close to 47% but the live interviewer polls' estimate is closer to 48% and the IVR estimates are at 50%. We can conclude that, if there is a substantial underestimation of the support for Trump, "objects may be closer than they appear". We will still have to understand, however, why the telephone polls estimate that support for Trump has been increasing lately when WEB polls show, on the contrary, that it has decreased.
These analyses do not take into account the fact that, if there is an underestimation of the support for Trump, whether it is due to a "shy Trump" effect or to other factors, these effects may occur "in context". They may be more substantial in states or areas that are more pro-Biden for example. One could also argue that a similar "shy Biden' effect is present in states or areas that have a concentration of Trump supporters. This would mean that the two phenomena may cancel each other. We should also remind ourselves that "shyness" is not that likely to occur in polls conducted without interviewers, that is, WEB and IVR polls. Those polls account for 85% of the polls published since September 1st.
We thank Luis Patricio Pena Ibarra for his meticulous work in entering the data and the related information to help me conduct these analyses.
Methodology for the analyses presented in this post.
1) The trends produced are produced by a local regression procedure (Loess). This procedure gives more weigth to estimates that are closer to each other and less to outliers. It is however rather dependent on where the series start. For example, I started this series with the polls conducted after August 1st. This means that all the polls conducted since then influence the trends. If I start on September 1st, the trends are somewhat different because of the absence of the influence of the polls conducted in August. I try to balance the need to have enough polls for analysis and the importance of not having old information influence the trends too much.
2) Every poll is positioned on the middle of its fieldwork, not at the end of it. This seems more appropriate since the information has been gathered over a period of time and reflects the variation in opinion during that period.
3) All the tracking polls are included only once for each period. A poll conducted on three days is included once every three days. In this way, we only include independent data. This is appropriate in statistical terms.
4) The data used comes from the answers to the question about voting intention.
5) For non-Americans, note that, in the USA, IVR ("robopolls") cannot be used to call cell phones.U.S.A. seems to be the only country that has this restriction.