Saturday, October 13, 2012

The Best Measure of Sample Bias

There are two competing interpretations of all of the polling from the last couple of months with samples containing many more Democrats than Republicans. The Republicans (including me) contend that these samples are not representative of the actual population of American voters, and so are not showing results that will agree with the election. The pollsters (and Democrats) maintain that they don't weight  their samples this way on purpose, they weight according to the Census reports by categories of age, sex, race, and assorted other criteria, and the results are the results.

Other than looking at the evidence outside of polling (as I've done) like voter registration, historic patterns, etc. - none of which indicate a big Democratic advantage this year - it didn't seem like there was any way to test which of these two interpretations was correct within the actual polling results. However, there IS a poll question that I believe can definitively establish whether the polls are biased toward Obama and the Democrats or not. The only problem is that so few pollsters ask the question and/or report the results. In any event, I have found one national pollster that asks the question (Pew), and one state level pollster who asks the question (WMUR/UNH Granite State Poll).

What's the question? It is, "Who did you vote for in the previous presidential election?"

It is a fact that Obama won nationally by 7.3 points, 52.9 to 45.6. It is also a fact that Obama won in New Hampshire by 9.6 points, 54.1 to 44.5. ANY FAIR SAMPLE should have respondents who match these numbers, within a couple of points. In fact, I'd argue that a correct sample should have a somewhat smaller margin of Obama voters given the depressed Democratic registrations and Republican enthusiasm advantage across the country.

So, have the samples been biased?


Pew is in fact a terrific example. In September they released a poll showing Obama up by 8 points (the "Who did you vote for in 2008?" question is on the top of page 81), and then this week released a poll which caused much wailing and consternation among the Democrats showing Romney up by 4 (This is on page 37). Was this mostly due to Romney's sterling debate performance, or was it mostly due to a shift in the sample?

The September sample included 86% of respondents who stated that they had voted in '08. Of this number, 54.7% (47/86) voted for Obama. Only 37.2% (32/86) of their sample of 2008 voters had voted for McCain! This is a +17.5 Obama sample, from an election that produced a 7.3 point Obama win. In other words (in those who had previously voted), this sample was biased towards Obama by a more than 10 points! There is no spinning this as 'this is just what the numbers are', this is proof positive that this poll strongly over-sampled Obama voters.

In October, the sample included a nearly identical percentage of 2008 voters, 85%. This time, however, only 49.4% (42/85) of these reported voting for Obama vs 43.5% (37/85) McCain voters. This is a much more representative sample, with a 5.9 point Obama advantage vs the 7.3 points he won by, a difference of only 1.4 points.

As I said earlier, I believe that given falling Democratic registrations across the country - a symptom of lower Democratic enthusiasm - that a small change away from Obama voters makes perfect sense. So this Pew poll's sample looks to accurately reflect the race as it currently sits, while the September poll's sample did not. Pretty much the entire 12 point shift from Obama to Romney was a function of fixing the sample.


The University of New Hampshire released a poll on October 1st showing Obama leading Romney by 15 points, 52-37. They don't label their pages, but about three quarters down they show their 'likely voter' statistics, including the number of respondents for each category. If you just looked at their Democratic vs Republican numbers, it actually looks reasonable - 166 Democrats vs 161 Republicans. However, when you look at the breakdown of Obama/McCain voters, the Obama bias pops out. Of the 543 'likely voters' (215+314+14) who voted in 2008, 57.8% of them (314/543) voted for Obama vs 39.6% (215/543) McCain voters. This is a 18.2 point edge in 2008 Obama voters, even though his edge was only 9.6 points - an 8.6 point skew towards Obama.

On Oct 9th, after the debate, they released another poll showing a big shift toward Romney, with Obama leading 50-44 among 'likely voters, with leaners' (again, about 3/4 down). Was this also a result of a shift in sample? Actually, no. The swing appears to be entirely a function of changed minds, because the sample remains just as skewed toward Obama as the previous one. There were 484 reported 2008 voters (289+195), and 59.7% (289/484) were Obama voters while only 40.3% (195/484) were McCain voters. Again, Obama only won by 9.6 points, so this sample is also skewed towards Obama by almost 10 points.

Keep in mind that New Hampshire Republicans have gone from a tiny deficit in registration in 2008 to a comparatively large advantage in the most recent data for 2012, 32.8% of registered voters vs 28.3% for the Democrats.

The Granite State Poll sample simply does not reflect reality. If it was weighted to reflect the actual number of 2008 voters, Romney would be leading. If it was further weighted to reflect the strong trend in registration toward the Republicans, Romney would be leading comfortably.


The evidence from the handful of polls I have found that include the question, "Who did you vote for in 2008?" indicates that those which show Obama doing well have samples that are strongly biased toward 2008 Obama voters. The single poll that did not have this issue showed Romney leading by 4 nationally.