Evaluating the significance of a survey used to be fairly straightforward – for any truly random sample, there is a well-defined margin of error which allows a critical reader to judge how valid any inferences are.

So for a survey of 1,000 respondents, the margin of error is around 3% at the 95% confidence level (this last figures indicates there is only a 5% probability that the findings were the result of chance). If a news story says Ed Miliband is leading David Cameron by 42% to 37% in the polls, we know this falls within the stated margin of error, since Ed Miliband’s true position may be 39% (42% – 3%) and David Cameron’s may be 40% (37% + 3%).

The difficulty from the pollster’s point of view is that creating a truly random sample is costly and time-consuming. So recently, non-probability sampling has enjoyed favour, particularly in the US and now increasingly in the UK.

Read the rest of this entry »

Poor Ed Miliband. Not only does the live TV feed break down just minutes into his speech to the Labour Party conference in Liverpool, but a poll in the Independent reports that “the Tories enjoy a lead for the first time since October last year”.

The Conservatives are on an impressive 37% while Labour languishes behind with a paltry 36%. So that’s that, then!

Except that it isn’t.

Read the rest of this entry »

Why 4 is a magic number

March 26, 2010

David Cameron has waved goodbye to his chances of becoming Prime Minister, a poll in the Daily Mirror claimed earlier this week.

The poll, carried out by Ipsos Mori, puts the Tories on 35% with Labour on 30% and Lib Dems on 21%. Because of the peculiarities of the British electoral system, this result would give Gordon Brown about 290 MPs, making Labour the largest single party but with no overall majority.

All well and good. But there is one fact missing: the margin of error. The full report from the Ipsos Mori website makes no secret of the fact that the margin of error is 4% (although this figure is nowhere to be found in the Mirror). Since the putative difference between the parties is only 5%, that’s a pretty significant margin. As Ipsos Mori say about the margin of error: “This is especially important to keep in mind when calculating party lead figures”. Seems the Mirror forgot to read this bit.

Labour is clawing back support and we may end up with a hung Parliament, according to a recent poll in the Observer [new window].

The IPSOS Mori survey puts the Tories on 37% and Labour on 31%, slashing the Conservative lead from 20 points in some polls last year. If the nation voted that way at the election, it would give David Cameron’s party the largest number of seats but still 30 or so short of a majority.

As with most news organisations, the Observer reports some details about the poll itself – 1,006 respondents were interviewed by telephone. But we aren’t told the margin of error. While this is common practice, it does disguise one very important fact – that the poll might be evidence that Labour support is actually ahead of that for the Conservatives. Read the rest of this entry »

When you look at most polls, it is the figures behind the headlines which are the most important – chief among these is sample size.

If you’re carrying out a poll which aspires to anything beyond than the status of the anecdotal, the reader needs an idea of its scale (clearly, other measures such as margin of error are helpful, too). This is really the minimum requirement of credibility. Read the rest of this entry »

The results of a recent survey, carried out by KRC Research/YouGov for the Bar Standards Board, which purport to show that journalists rank alongside politicians and estate agents as the least trusted professionals, started me thinking about how reliable online surveys are.

What prompted this thought was the description towards the end of the article of how the data was collected: they “polled a nationally representative sample of 2,044 adults in Great Britain online”.

Now, I’m aware of YouGov and its excellent reputation, and have no reason to believe KRC is any less reliable, and they may well have adopted methodologies which overcome the obvious limitations of web surveys (although I couldn’t find any such detail during a quick trawl of either organisation’s website).  If so, I’d be very interested to learn what they are.

My own experience of running online surveys has been in the context of newspaper websites, where third-party sites such as Survey Monkey or Poll Daddy came in handy. But we didn’t present the results as particularly scientific and tended to use the comments left by respondents as much as the numerical data itself.

Some of the shortcomings of online surveys are detailed in a paper discussing this very issue [PDF] by Andrews, Nonnecke and Preece. The fundamental problem, of course, is that unless you use a panel of verifiable individuals, your confidence in the randomness of your sampling is always going to be less than in the case of face-to-face or even mail-based surveys. This is simply because you can’t rely on people on the Internet being who they say they are. While I reckon I’m pretty confident of being able to spot a male aged 40-55 or a female aged 18-25 in the street, I would be a fool to myself if I thought I could identify the gender or age of anyone online (unless they were known to me, of course).

When it comes to the techniques used by most newspapers (which the paper characterises as “self-selection Web-based surveys”), the authors conclude “there is no attempt to statistically sample the online population, although some claims for scientific validity are sometimes made”.

Volunteer panels, whereby individuals provide demographic information on the basis of which they are then invited to take part in the survey, fail to overcome the credibility hurdle –  “the base of this approach is still self-selection, not a sound statistical sampling approach”.

It is debatable whether it is even possible to draw a random sample from among Internet users – this study from the Georgia Institute of Technology argues that it is “impossible to draw a random sample from a complete, or nearly complete, list of Web users”.

In addition, the results of an online survey cannot be uncritically extended to the wider population: “To infer for a general population based on a sample drawn from an online population is not as yet possible” (Andrews, Nonnecke and Preece).

The trio suggest that one way round the inherent sampling unreliability is to start off by narrowing the pool of prospective samplees and to be satifisfied with indicative data gathered in this way. The idea is to limit the potential survey candidates to the users of specific websites, discusssion groups or bulletin boards and then, within this “artificially defined sampling frame”, to apply the standard rules of random sampling.

Whatever the methodological framework employed, it ought to be possible to give margins of error for online surveys. I certainly would have welcomed seeing this detail in the KRC Research/YouGov survey. Whatever the level of public trust in journalists, it would have increased my level of trust in this survey.

Poll with the hole

March 27, 2009

BBC is running a story about proposed changes to the Act of Succession, which would allow a future monarch to marry a Catholic, and for female royals to have the same place in the pecking order as male heirs. There is widespread public support for change, the Today programme assured us.

A poll for the BBC found 89% in favour of equal rights for royal women and 81% approved of the heir being allowed to marry a Catholic … but, as presented, the results were meaningless because there was no indication of sample size and hence no way of judging its validity.

In fairness, the BBC website does state the sample size (1,000 polled by ICM between March 20-22), but why couldn’t this detail have been given or at least referenced in the broadcast? And while it’s laudable that the website gives the sample size, additional detail such as the margin of error would be useful – why not link to this, too?