Monday, May 24, 2010

Economist's Notebook: Statistical Inference

Nate Silver has a fascinating blog entry on the problem that modern people are causing pollsters.  He uses this graph from the CDC which shows the population of cell phone-only adults in the US:

Pollsters don't like cell-phones apparently because they are more expensive to call (I don't know why) and there are more restrictions for automated calls to mobile phones. What this poses to pollsters is a problem of statistical inference: making generalizations about a population from a sample.

Here is silver:

Cellphone-only households are different from their landline-using counterparts. They tend to be younger, poorer, more urban, less white, and more Internet-savvy. All of these characteristics are correlated with political viewpoints and voting behavior.

The pollsters' usual defense mechanism against this is to weight their polls by demogrpahics [sic] -- something which they need to do anyway, since polls are subject to many forms of non-response bias (for instance, it's harder to get men on the phone then women). But this is potentilly an inadequate response for several reasons. First, some characteristics that correlate with both cellphone usage and political preferences may not correspond to those that are most commonly used to weight polls. It is somewhat rare, for instance, for pollsters to weight their polls by characteristics like urban/rural location or marital status, which are predictive of both cellphone usage and political beliefs. Being cellphone-dependent also appears to be significantly correlated with media consumption habits (in particular, getting more of one's news from the Internet and less from television), which also seems to be increasingly important in determining one's political views. And there are some characteristics that may be even more subtle. For instance, there are some hints in the CDC data (such as the higher prevalance of binge drinking) that cellphone-only adults are less "domestic" and more "bohemian". I suspect that, in young adults, this is correlated with more liberal political views.

Secondly, even where weighting occurs, one may encounter problems when upweighting from very small subsamples. It is now very difficult, for instance, to get young people on the phone when using a landline-only sample. About half of all adults from age 25-29, for instance, are cellphone-only, and two-thirds are either cellphone-only or cellphone-mostly. (The numbers are actually slightly better for adults aged 18-24, who are more likely to be living in a college dormatory, or still to be living at home, where a landline will usually be available.) Couple this with the fact that young people have grown up in a call-screening culture, and their response rates are often completely inadequate. Say that you're supposed to have 100 people aged 18-29 in a poll of 500 adults, but in fact you only get 30 because of problems with call-screening and cellphone usage. The margin of error on a sample of that size is 18 percent. And yet, you may essentially let each of these young people speak on behalf of two or three of their peers, to compensate for the ones you haven't gotten in contact with.

A new study from Pew, in fact, has found that these weighting schemes may have become inadequate. In their experiment, a weighted landline-only sample produced a generic ballot result of Republicans 47, Democrats 41, whereas a weighted landline-plus-cellphone sample had the generic ballot tied 44-44. That six-point net difference is statistically significant, and needless to say, could have huge implications for where the parties finish in November.

His suggested solutions include using non-traditional sample weights and larger sample sizes. The second one baffles me, if a sample is biased just making it larger doesn't help. Seems to me, the solution is to pay up and call cell phones. But it is interesting that what has worked for decades, calling land-lines, is now starting to fail thanks to new technology.

Jeff Alworth said...

Pew has been following this closely for years now. (Here's their report from 2008.) Pew's methodology is the gold standard: they usually sample 2000 people, they ensure to include cell-only users, and they dial randomly, not relying on listed numbers.

This allows them to make real-world compensation for sample bias. The larger sample size allows for upsampling of the smaller demos, and it's more accurate because the smaller demos come from land-only, cell-only, and mixed-use users.

Interestingly, there's some suggestion that robo-polling may be more accurate than some old-time pollsters. Robo-polls also dial randomly, which gives them an advantage over pollsters still relying on listed numbers.

I actually think we're in the middle of the worst of it. Land lines are a dying tech, and within 20 years will form a tiny percentage of poll samples. The difficulty arises during moments when one technology is in the middle of supplanting another.

(Unless, of course, we completely abandon phones for other communciation modes--which I'm in the middle of trying to do.)