Evaluating the significance of a survey used to be fairly straightforward – for any truly random sample, there is a well-defined margin of error which allows a critical reader to judge how valid any inferences are.

So for a survey of 1,000 respondents, the margin of error is around 3% at the 95% confidence level (this last figures indicates there is only a 5% probability that the findings were the result of chance). If a news story says Ed Miliband is leading David Cameron by 42% to 37% in the polls, we know this falls within the stated margin of error, since Ed Miliband’s true position may be 39% (42% – 3%) and David Cameron’s may be 40% (37% + 3%).

The difficulty from the pollster’s point of view is that creating a truly random sample is costly and time-consuming. So recently, non-probability sampling has enjoyed favour, particularly in the US and now increasingly in the UK.

Read the rest of this entry »

Advertisements

Much consternation among the chattering classes about a poll which shows Fox News is the most trusted news channel in the US [Guardian report here].

The shock-horror findings that 49% of respondents said that they trusted the right-wing Murdoch-owned channel, as opposed to 39% for CNN and 32% for CBS, come from research carried out by North Carolina-based firm Public Policy Polling (PPP).

Of course, trust in Fox was most forthcoming from Republicans (74%) as opposed to Democrats (30%).

But I was most interested in comments from Guardian readers which questioned the validity of the poll data: “I sincerely hope this is a statistical anomaly resulting from the location of the voters”; “Only a thousand odd from a population of how many?” (referring to the survey’s base of 1,151 voters). Read the rest of this entry »

The results of a recent survey, carried out by KRC Research/YouGov for the Bar Standards Board, which purport to show that journalists rank alongside politicians and estate agents as the least trusted professionals, started me thinking about how reliable online surveys are.

What prompted this thought was the description towards the end of the article of how the data was collected: they “polled a nationally representative sample of 2,044 adults in Great Britain online”.

Now, I’m aware of YouGov and its excellent reputation, and have no reason to believe KRC is any less reliable, and they may well have adopted methodologies which overcome the obvious limitations of web surveys (although I couldn’t find any such detail during a quick trawl of either organisation’s website).  If so, I’d be very interested to learn what they are.

My own experience of running online surveys has been in the context of newspaper websites, where third-party sites such as Survey Monkey or Poll Daddy came in handy. But we didn’t present the results as particularly scientific and tended to use the comments left by respondents as much as the numerical data itself.

Some of the shortcomings of online surveys are detailed in a paper discussing this very issue [PDF] by Andrews, Nonnecke and Preece. The fundamental problem, of course, is that unless you use a panel of verifiable individuals, your confidence in the randomness of your sampling is always going to be less than in the case of face-to-face or even mail-based surveys. This is simply because you can’t rely on people on the Internet being who they say they are. While I reckon I’m pretty confident of being able to spot a male aged 40-55 or a female aged 18-25 in the street, I would be a fool to myself if I thought I could identify the gender or age of anyone online (unless they were known to me, of course).

When it comes to the techniques used by most newspapers (which the paper characterises as “self-selection Web-based surveys”), the authors conclude “there is no attempt to statistically sample the online population, although some claims for scientific validity are sometimes made”.

Volunteer panels, whereby individuals provide demographic information on the basis of which they are then invited to take part in the survey, fail to overcome the credibility hurdle –  “the base of this approach is still self-selection, not a sound statistical sampling approach”.

It is debatable whether it is even possible to draw a random sample from among Internet users – this study from the Georgia Institute of Technology argues that it is “impossible to draw a random sample from a complete, or nearly complete, list of Web users”.

In addition, the results of an online survey cannot be uncritically extended to the wider population: “To infer for a general population based on a sample drawn from an online population is not as yet possible” (Andrews, Nonnecke and Preece).

The trio suggest that one way round the inherent sampling unreliability is to start off by narrowing the pool of prospective samplees and to be satifisfied with indicative data gathered in this way. The idea is to limit the potential survey candidates to the users of specific websites, discusssion groups or bulletin boards and then, within this “artificially defined sampling frame”, to apply the standard rules of random sampling.

Whatever the methodological framework employed, it ought to be possible to give margins of error for online surveys. I certainly would have welcomed seeing this detail in the KRC Research/YouGov survey. Whatever the level of public trust in journalists, it would have increased my level of trust in this survey.