Poll Math Question

mike777 · December 4, 2007

Poll Math Question.

Candidate A=26%

Candidate B=25%

Plus or minus 4pts.

What does this mean?

A has somewhere between 22-30% and B has somewhere between 21-29%?

Does this mean A may be at 22% and B at 29%?

Other?

hrothgar · December 4, 2007

In the case of Candidate A, you are X% sure that the A is somewhere between 22% and 30% (where X is the confidence interval)

This information is readily available if you Google "poll" and "confidence interval"

mike777 · December 4, 2007

ok and the same thing for B also?

so A could be at 22% and B at 29% or other?

If so my point is the media never makes this clear....in this example b could have a huge lead over A but they never say that.......

"Confidence Intervals

The Margin of Error

Even if you aren't familiar with confidence intervals, you've probably unknowingly run across them. You've probably heard the term "Margin of Error" used along with the results of a survey of, say, a presidential poll.

After polling 1000 eligible voters, the Star-Tribune Newspaper reported that 55% of Americans would vote for James Bean and 45% for John F Daniels +/- 3%.

That plus or minus disclaimer is the margin of error. In other words, the margin of error means that James Bean could be favored by as much as 58 to 42 percent (55 + 3) or as low as 52 to 48 percent (55 - 3)-- a six percentage point spread (58-52 = 6). This spread is the confidence interval. "

http://www.measuringusability.com/stats/ci/ci_instr1.php

hrothgar · December 4, 2007

ok and the same thing for B also?

so A could be at 22% and B at 29% or other?

If so my point is the media never makes this clear....in this example b could have a huge lead over A but they never say that.......

Correct: B is somewhere between 21% and 29%

The easiest way to understand whats going on it to picture one normal distribution centered at 25% and a second at 26%.

Damn media. How dare they presume that anyone has an education...

(In all seriousness, this gets covered in High School - Or at least it is in New York)

kenberg · December 4, 2007

It seems that saying 25% with a 4% margin of error ought to communicate a 21 to 29 point range to the normally competent person. What I think they do not say at all is that they are not certain that the result is in that range. Of course maybe people should realize that it's impossible to be certain of an entire population by sampling a portion of that population but it seems some estimate of their certainty should be part of the statement.

Eg We are 95% certain that between 21% and 29% of the population favors candidate whoever. Or, if "95% certain" isn't a clear statement, maybe they should say "We have followed a procedure that 95 times out of 100 will give us an interval containing the true percentage of folks across the population favoring candidate whoever, and that procedure gave us the interval 21-29. We wish to warn our listeners that 5 times out of 100 this procedure will gives us an interval that does not contain the true percentage".

That is, of course, assuming that my guess of what they are doing is correct (the percentage certainty might not be 95 of course).

They could be clearer about the meaning of their ranges, but my real gripes about these polls lies elsewhere.

Gerben42 · December 4, 2007

Normally this means that the result of the poll was exactly

A = 26%

B = 25%

but because of the uncertainty in any poll there is a confidence interval (usually 95%), and if that's plus or minus 4% this means that if not a sample but ALL people would be asked, it's possible that A = 22% and B = 29% also.

In fact we don't know the actual result of the poll if we would have asked everyone, which is basically an election. If the confidence interval is 95%, the 4% uncertainty roughly corresponds to 1.96 standard deviations in a normal distributions.

So the probability distribution of the election result of candidate A based on the poll of maybe 1000 people or so is a normal distribution with mean 26% and standard deviation of 2.0%.

Edit: numbers fixed.

Edited December 4, 2007 by Gerben42

hrothgar · December 4, 2007

In fact we don't know the actual result of the poll if we would have asked everyone, which is basically an election. If the confidence interval is 95%, the 4% uncertainty roughly corresponds to 5/3 standard deviations in a normal distributions.

So the probability distribution of the election result of candidate A based on the poll of maybe 1000 people or so is a normal distribution with mean 26% and standard deviation of 2.4%.

Gerben:

Mind explaining where you got the 5/3 value?

Its way early in the morning (and I didn't sleep well) but, the 68-95-99 rule states that

68% of observations fall within one standard deviation of the mean

95% fall within two standard deviations of the mean

99% fall within three standard deviations of the mean

Gerben42 · December 4, 2007

I shouldn't be doing this in the morning, I mixed up the 1-sided and 2-sided number.

For 1-sided tests, 95% is 1.65 standard deviations, for 2-sided tests 95% is 1.96 standard deviations.

So your "rule" is correct (for 2-sided tests). All of the tests in my current work are 1-sided so this 5/3 value was in my head all the time.

helene_t · December 4, 2007

"We have followed a procedure that 95 times out of 100 will give us an interval containing the true percentage of folks across the population favoring candidate whoever, and that procedure gave us the interval 21-29. We wish to warn our listeners that 5 times out of 100 this procedure will gives us an interval that does not contain the true percentage".

This is a reasonable explanation which you will find in some textbooks, but if I'm allowed to be pedantic I'll say that it's not correct.

The strict definition of the 95% confidence interval is the range of hypothetical true percentages, under which the empirical percentage which we actually found would have a likelihood above the 5% quantile.

There is an alternative procedure, the Bayesian one, which assumes some prior distribution of the true percentages and then updates this distribution facing the data. Then you can report a Bayesian confidence interval, defined as the range of percentages with a posterior likelihood above the 5% quantile.

If you use a "flat" prior (i.e. all plausible percentages are a priori equally likely) then for most practical purposes it hardly matters which procedure you use.

However, to see the difference, think of the extreme case where a particular candidate got zero votes in the poll. This may translate into a confidence interval of, say, 0% to 4%, but surely you cannot interpret that as if there was a 95% chance that the true percentage is below 4%, unless you specify a prior.

kenberg · December 4, 2007

"We have followed a procedure that 95 times out of 100 will give us an interval containing the true percentage of folks across the population favoring candidate whoever, and that procedure gave us the interval 21-29. We wish to warn our listeners that 5 times out of 100 this procedure will gives us an interval that does not contain the true percentage".
This is a reasonable explanation which you will find in some textbooks, but if I'm allowed to be pedantic I'll say that it's not correct.

The strict definition of the 95% confidence interval is the range of hypothetical true percentages, under which the empirical percentage which we actually found would have a likelihood above the 5% quantile.

There is an alternative procedure, the Bayesian one, which assumes some prior distribution of the true percentages and then updates this distribution facing the data. Then you can report a Bayesian confidence interval, defined as the range of percentages with a posterior likelihood above the 5% quantile.

If you use a "flat" prior (i.e. all plausible percentages are a priori equally likely) then for most practical purposes it hardly matters which procedure you use.

However, to see the difference, think of the extreme case where a particular candidate got zero votes in the poll. This may translate into a confidence interval of, say, 0% to 4%, but surely you cannot interpret that as if there was a 95% chance that the true percentage is below 4%, unless you specify a prior.

Nothing wrong with a l;ittle pedantry! I have to go make some money being a pedant but I'll think this through later. I am sort of aware of what you describe but I'm weak on the details.

Elianna · December 4, 2007

(In all seriousness, this gets covered in High School - Or at least it is in New York)

I teach it to my 12th graders. But I'm not at Public school. I don't believe it's in our standards (CA), but I can look it up.

joshs · December 4, 2007

Basic statistics was really not taught at any level in my education.

E.g.

A. Not taught in High School

B. Not taught as part of a Math Undergrad degree

C. Not taught in any of my official course for my math PhD (although I did audit 2 stats classes)

I did learn a little bit about the Normal Distribution in my physics courses.

For the record, my experience in aerospace, is less than 10% of engineers ever studied any stats, and a lot less actually know any stats. Of the folks I interview for my current work (stats for finance) probably 50% of the PhDs actually know even these sort of basic things...

P.S. In the spirit of being pedantic, poll results follow a bernulli distribution not a normal distribution. The key fact is that as the number of trials gets large, and the sample mean is sufficiently far away from 0 or 1, then the bernulli gets really close to the gaussian. Of course a gaussian is unbounded, and bernulli's are between 0 and 1, which is why you clearly have problems using the approximation near 0 and 1...

BTW, Ian Ayres book, Super Crunchers is a good layman's intro to a lot of this stuff and discussed polling and many other applications for statistics.

hrothgar · December 4, 2007

(In all seriousness, this gets covered in High School - Or at least it is in New York)
I teach it to my 12th graders. But I'm not at Public school. I don't believe it's in our standards (CA), but I can look it up.

Hey there

Quick point of clarification: When I said that this was covered in High School in New York, I was referring to 11th grade Social Studies rather than any of the Math classes.

As I recall, the Math core was very focused at progressing towards Caluclus in an efficient fashion. "Real" Statistics (as opposed to probability) never got covered.

However, the 11th Grade American history course included a combination of history, civics, and current events. Statistical literacy was probably slotted under civics...

kenberg · December 4, 2007

I think 25 plus or minus 4 is understandable without statistics. Confidence intervals are another matter and, as Helene points out, many polls are done with more involved technique than basic random sampling so a HS summary of statistics, at least in most schools, won't be adequate to really understand. what has been done.

I expect an adequately, for most purposes, correct statement of 25 plus or minus four might be: We are really quite confident that the true percentage lies between 21 and 29, it's true that statistics is such that from time to time we may report intervals that are wrong, it's also true that more often then not the true answer will be somewhat closer to 25 than it will be to either 21 or 29. Probably most listener's just hear the 25, know that polls are not exact, and don't worry much about it.

Really I think a greater problem with polls is that they often ask hypotheticals. Eg If Hillary were running against Rudy and you were voting today who would you vote for? Well, Hillary is not yet running against Rudy and I am not voting today so I have not thought that through very clearly, and I strongly suspect many other people have thought it through even less. So you get numbers but not, imo, much meaning. People get overly fascinated with numbers. In my view it is not so much "what does 25 plus or minis 4 mean?" as "does the whole poll mean anything?". It has consequences, so it has meaning in that sense, but I hate to see much actually depend on it.

Elianna · December 4, 2007

(In all seriousness, this gets covered in High School - Or at least it is in New York)
I teach it to my 12th graders. But I'm not at Public school. I don't believe it's in our standards (CA), but I can look it up.
Hey there

Quick point of clarification: When I said that this was covered in High School in New York, I was referring to 11th grade Social Studies rather than any of the Math classes.

As I recall, the Math core was very focused at progressing towards Caluclus in an efficient fashion. "Real" Statistics (as opposed to probability) never got covered.

However, the 11th Grade American history course included a combination of history, civics, and current events. Statistical literacy was probably slotted under civics...

Yea, I was teaching a math class, but it was more a survey course, then a pre-Calculus class.

This year I'm teaching more of a pre-calculus class, so I likely won't get near covering statistics (or even probability for that matter) as the girls have not even had standards from Algebra II covered (functions, logarithms, etc). It's very frustrating.

But I bet you that my students last year could have found confidence intervals from a given poll.

matmat · December 5, 2007

proper statistics/probability doesn't seem to be taught. As Josh mentioned above, in my edumacation there was also no mandatory class that would teach this material (high schools in two countries, college and grad school) -- i did take a couple of electives, but yeah... this just isn't taught...

Rossoneri · December 5, 2007

In Singapore it is taught in the GCE 'A' Levels, though I am not too sure about GCSEs in the UK...iirc my friend told me you can get by without any stats, correct me if I am wrong.

I must say I am a bit surprised it isn't taught in some places.

gwnn · December 5, 2007

it's in 10th grade material I think in Romania "de jure", but it's not taught "de facto" cause you don't have to know any in the final exams. no statistics have been taught in the physics uni, even though it would be quite useful e.g. for error calculation. we will have a Statistical Physics course in semester 5, though arguably not quite about the same issues.

helene_t · December 5, 2007

In Denmark it wasn't taught when I went to grammar school in the 80's, it might be now.

There are quite good statistics courses at the business school and several university departments.

Elianna · December 5, 2007

At Harvey Mudd, some statistics (mainly for error analysis) was taught in our physics labs. Also, probability had a very tiny amount of statistics. But I was never required to take a proper statistics class through a math undergrad degree, and would not have been required for a PhD (UNL).

finally17 · December 5, 2007

I would be really really shocked if there is single a non-accelerated public high school program math program in the country that covers standard deviations and confidence intervals. Certainly it's covered, if I recall correctly there's even a stats AP, and most people would be surprised at how far some programs go given the general impression of Americans and math, but most public programs struggle to get through algebra.

Which poses an interesting question because for the average person a basic stats course would probably be more useful.

mycroft · December 5, 2007

1. Covered in first-year Engineering Math (same place we discussed precision (and the cost of precision, and why you don't over-specify precisions when you're planning/purchasing), significant digits, . Covered *again* in Statistics for Engineers in second year, where it came in about the Normal distribution when we were covering all the "usual distributions".

Of course, Engineering is neither Math nor Physics, although it uses a lot of both - the emphasis is very different.

2. Covered - very well, I might add, which of course, the whole book does, even if the math is missing a couple of zeros due to inflation - in "How to Lie With Statistics", by Huff.

Michael.

slothy · December 5, 2007

Re Math education in the UK

A lot of kids in the UK - a country fallaciously perceived (by some) as some educational paradigm, yet is scraping the bottom in european 'League Tables' despite attempts by the Education Minister to massage figures and trump out gaping disparities in the statistics - are leaving school without a basic and 'working' concept of fractions (and its illegitimate 'half-sisters' %ages, decimals, ratios etc) , geometry and basic algebra - never mind more elevated ideas like calculus and advanced trigonometry.

Those that perform well are those that have a succouring environment at home and concerned parents who invest their own time (not necessarily money) and effort.

I have worked with too many young kids who have fallen so far behind that it becomes a Sisyphean task to get them up to speed to barely pass their GCSE. Half the climb is persuading them of the value of having an education - however futile and unaccommodating it may appear to them AT THE TIME. Once this attitude is instilled, they become receptive to learning and achieve beyond their, and of others more judgemental, expectations.

It is a shame that at least here education is becoming undervalued and even sneered at, and this attitude is becoming more and more acceptable.

beatrix45 · December 6, 2007

:P Backward run the sentences until reel the mind. In statistics everything is stated in a sort of backward fashion. The statement you give says:

For A: There are 19 chances out of 20 (assuming the commonly used 95% confidence interval) that candidate A had between 22 and 30% of the vote at the time the survey was taken. There is one chance in 40 that he had more than 30% of the vote and one chance in 40 he had less than 22%.

For B: There are 19 chances out of 20 that B had between 21 and 29% of the vote. One chance in 40 he had more than 29% and a similar probability he had less than 21%.

We are assuming the survey was done correctly with all the statistician's assumptions being met (most surveys, almost all in fact, aren't so pristine, and they fall short at least to some degree). Many political surveys stray so far from the necessary assumptions, that their plus or minus so-and-so statements are pretty much worthless.

That's it folks. That's all. There ain't no more.

jdonn · December 6, 2007

For A: There are 19 chances out of 20 (assuming the commonly used 95% confidence interval) that candidate A had between 22 and 30% of the vote at the time the survey was taken. There is one chance in 40 that he had more than 30% of the vote and one chance in 40 he had less than 22%.

I'm a bit rusty in this stuff, but isn't it NOT (necessarily) true that, given a 95% confidence interval, there is a 2.5% chance the true answer lies below and a 2.5% chance the true answer lies above?

Poll Math Question

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation