Jump to content

Poll Math Question


mike777

Recommended Posts

ok and the same thing for B also?

 

so A could be at 22% and B at 29% or other?

 

If so my point is the media never makes this clear....in this example b could have a huge lead over A but they never say that.......

 

"Confidence Intervals

 

The Margin of Error

Even if you aren't familiar with confidence intervals, you've probably unknowingly run across them. You've probably heard the term "Margin of Error" used along with the results of a survey of, say, a presidential poll.

 

After polling 1000 eligible voters, the Star-Tribune Newspaper reported that 55% of Americans would vote for James Bean and 45% for John F Daniels +/- 3%.

 

That plus or minus disclaimer is the margin of error. In other words, the margin of error means that James Bean could be favored by as much as 58 to 42 percent (55 + 3) or as low as 52 to 48 percent (55 - 3)-- a six percentage point spread (58-52 = 6). This spread is the confidence interval. "

 

 

http://www.measuringusability.com/stats/ci/ci_instr1.php

Link to comment
Share on other sites

ok and the same thing for B also?

 

so A could be at 22% and B at 29% or other?

 

If so my point is the media never makes this clear....in this example b could have a huge lead over A but they never say that.......

 

Correct: B is somewhere between 21% and 29%

 

The easiest way to understand whats going on it to picture one normal distribution centered at 25% and a second at 26%.

 

Damn media. How dare they presume that anyone has an education...

 

(In all seriousness, this gets covered in High School - Or at least it is in New York)

Link to comment
Share on other sites

It seems that saying 25% with a 4% margin of error ought to communicate a 21 to 29 point range to the normally competent person. What I think they do not say at all is that they are not certain that the result is in that range. Of course maybe people should realize that it's impossible to be certain of an entire population by sampling a portion of that population but it seems some estimate of their certainty should be part of the statement.

Eg We are 95% certain that between 21% and 29% of the population favors candidate whoever. Or, if "95% certain" isn't a clear statement, maybe they should say "We have followed a procedure that 95 times out of 100 will give us an interval containing the true percentage of folks across the population favoring candidate whoever, and that procedure gave us the interval 21-29. We wish to warn our listeners that 5 times out of 100 this procedure will gives us an interval that does not contain the true percentage".

 

That is, of course, assuming that my guess of what they are doing is correct (the percentage certainty might not be 95 of course).

 

They could be clearer about the meaning of their ranges, but my real gripes about these polls lies elsewhere.

Link to comment
Share on other sites

Normally this means that the result of the poll was exactly

A = 26%

B = 25%

 

but because of the uncertainty in any poll there is a confidence interval (usually 95%), and if that's plus or minus 4% this means that if not a sample but ALL people would be asked, it's possible that A = 22% and B = 29% also.

 

In fact we don't know the actual result of the poll if we would have asked everyone, which is basically an election. If the confidence interval is 95%, the 4% uncertainty roughly corresponds to 1.96 standard deviations in a normal distributions.

 

So the probability distribution of the election result of candidate A based on the poll of maybe 1000 people or so is a normal distribution with mean 26% and standard deviation of 2.0%.

 

Edit: numbers fixed.

Edited by Gerben42
Link to comment
Share on other sites

In fact we don't know the actual result of the poll if we would have asked everyone, which is basically an election. If the confidence interval is 95%, the 4% uncertainty roughly corresponds to 5/3 standard deviations in a normal distributions.

 

So the probability distribution of the election result of candidate A based on the poll of maybe 1000 people or so is a normal distribution with mean 26% and standard deviation of 2.4%.

Gerben:

 

Mind explaining where you got the 5/3 value?

 

Its way early in the morning (and I didn't sleep well) but, the 68-95-99 rule states that

 

68% of observations fall within one standard deviation of the mean

95% fall within two standard deviations of the mean

99% fall within three standard deviations of the mean

Link to comment
Share on other sites

I shouldn't be doing this in the morning, I mixed up the 1-sided and 2-sided number.

 

For 1-sided tests, 95% is 1.65 standard deviations, for 2-sided tests 95% is 1.96 standard deviations.

 

So your "rule" is correct (for 2-sided tests). All of the tests in my current work are 1-sided so this 5/3 value was in my head all the time.

Link to comment
Share on other sites

"We have followed a procedure that 95 times out of 100 will give us an interval containing the true percentage of folks across the population favoring candidate whoever, and that procedure gave us the interval 21-29. We wish to warn our listeners that 5 times out of 100 this procedure will gives us an interval that does not contain the true percentage".

This is a reasonable explanation which you will find in some textbooks, but if I'm allowed to be pedantic I'll say that it's not correct.

 

The strict definition of the 95% confidence interval is the range of hypothetical true percentages, under which the empirical percentage which we actually found would have a likelihood above the 5% quantile.

 

There is an alternative procedure, the Bayesian one, which assumes some prior distribution of the true percentages and then updates this distribution facing the data. Then you can report a Bayesian confidence interval, defined as the range of percentages with a posterior likelihood above the 5% quantile.

 

If you use a "flat" prior (i.e. all plausible percentages are a priori equally likely) then for most practical purposes it hardly matters which procedure you use.

 

However, to see the difference, think of the extreme case where a particular candidate got zero votes in the poll. This may translate into a confidence interval of, say, 0% to 4%, but surely you cannot interpret that as if there was a 95% chance that the true percentage is below 4%, unless you specify a prior.

Link to comment
Share on other sites

"We have followed a procedure that 95 times out of 100 will give us an interval containing the true percentage of folks across the population favoring candidate  whoever, and that procedure gave us the interval 21-29. We wish to warn our listeners that 5 times out of 100 this procedure will gives us an interval that does not contain the true percentage".

This is a reasonable explanation which you will find in some textbooks, but if I'm allowed to be pedantic I'll say that it's not correct.

 

The strict definition of the 95% confidence interval is the range of hypothetical true percentages, under which the empirical percentage which we actually found would have a likelihood above the 5% quantile.

 

There is an alternative procedure, the Bayesian one, which assumes some prior distribution of the true percentages and then updates this distribution facing the data. Then you can report a Bayesian confidence interval, defined as the range of percentages with a posterior likelihood above the 5% quantile.

 

If you use a "flat" prior (i.e. all plausible percentages are a priori equally likely) then for most practical purposes it hardly matters which procedure you use.

 

However, to see the difference, think of the extreme case where a particular candidate got zero votes in the poll. This may translate into a confidence interval of, say, 0% to 4%, but surely you cannot interpret that as if there was a 95% chance that the true percentage is below 4%, unless you specify a prior.

Nothing wrong with a l;ittle pedantry! I have to go make some money being a pedant but I'll think this through later. I am sort of aware of what you describe but I'm weak on the details.

Link to comment
Share on other sites

Basic statistics was really not taught at any level in my education.

 

E.g.

A. Not taught in High School

B. Not taught as part of a Math Undergrad degree

C. Not taught in any of my official course for my math PhD (although I did audit 2 stats classes)

 

I did learn a little bit about the Normal Distribution in my physics courses.

 

For the record, my experience in aerospace, is less than 10% of engineers ever studied any stats, and a lot less actually know any stats. Of the folks I interview for my current work (stats for finance) probably 50% of the PhDs actually know even these sort of basic things...

 

P.S. In the spirit of being pedantic, poll results follow a bernulli distribution not a normal distribution. The key fact is that as the number of trials gets large, and the sample mean is sufficiently far away from 0 or 1, then the bernulli gets really close to the gaussian. Of course a gaussian is unbounded, and bernulli's are between 0 and 1, which is why you clearly have problems using the approximation near 0 and 1...

 

 

 

BTW, Ian Ayres book, Super Crunchers is a good layman's intro to a lot of this stuff and discussed polling and many other applications for statistics.

Link to comment
Share on other sites

(In all seriousness, this gets covered in High School - Or at least it is in New York)

I teach it to my 12th graders. But I'm not at Public school. I don't believe it's in our standards (CA), but I can look it up.

Hey there

 

Quick point of clarification: When I said that this was covered in High School in New York, I was referring to 11th grade Social Studies rather than any of the Math classes.

 

As I recall, the Math core was very focused at progressing towards Caluclus in an efficient fashion. "Real" Statistics (as opposed to probability) never got covered.

 

However, the 11th Grade American history course included a combination of history, civics, and current events. Statistical literacy was probably slotted under civics...

Link to comment
Share on other sites

I think 25 plus or minus 4 is understandable without statistics. Confidence intervals are another matter and, as Helene points out, many polls are done with more involved technique than basic random sampling so a HS summary of statistics, at least in most schools, won't be adequate to really understand. what has been done.

 

I expect an adequately, for most purposes, correct statement of 25 plus or minus four might be: We are really quite confident that the true percentage lies between 21 and 29, it's true that statistics is such that from time to time we may report intervals that are wrong, it's also true that more often then not the true answer will be somewhat closer to 25 than it will be to either 21 or 29. Probably most listener's just hear the 25, know that polls are not exact, and don't worry much about it.

 

Really I think a greater problem with polls is that they often ask hypotheticals. Eg If Hillary were running against Rudy and you were voting today who would you vote for? Well, Hillary is not yet running against Rudy and I am not voting today so I have not thought that through very clearly, and I strongly suspect many other people have thought it through even less. So you get numbers but not, imo, much meaning. People get overly fascinated with numbers. In my view it is not so much "what does 25 plus or minis 4 mean?" as "does the whole poll mean anything?". It has consequences, so it has meaning in that sense, but I hate to see much actually depend on it.

Link to comment
Share on other sites

(In all seriousness, this gets covered in High School - Or at least it is in New York)

I teach it to my 12th graders. But I'm not at Public school. I don't believe it's in our standards (CA), but I can look it up.

Hey there

 

Quick point of clarification: When I said that this was covered in High School in New York, I was referring to 11th grade Social Studies rather than any of the Math classes.

 

As I recall, the Math core was very focused at progressing towards Caluclus in an efficient fashion. "Real" Statistics (as opposed to probability) never got covered.

 

However, the 11th Grade American history course included a combination of history, civics, and current events. Statistical literacy was probably slotted under civics...

Yea, I was teaching a math class, but it was more a survey course, then a pre-Calculus class.

 

This year I'm teaching more of a pre-calculus class, so I likely won't get near covering statistics (or even probability for that matter) as the girls have not even had standards from Algebra II covered (functions, logarithms, etc). It's very frustrating.

 

But I bet you that my students last year could have found confidence intervals from a given poll.

Link to comment
Share on other sites

proper statistics/probability doesn't seem to be taught. As Josh mentioned above, in my edumacation there was also no mandatory class that would teach this material (high schools in two countries, college and grad school) -- i did take a couple of electives, but yeah... this just isn't taught...
Link to comment
Share on other sites

it's in 10th grade material I think in Romania "de jure", but it's not taught "de facto" cause you don't have to know any in the final exams. no statistics have been taught in the physics uni, even though it would be quite useful e.g. for error calculation. we will have a Statistical Physics course in semester 5, though arguably not quite about the same issues.
Link to comment
Share on other sites

At Harvey Mudd, some statistics (mainly for error analysis) was taught in our physics labs. Also, probability had a very tiny amount of statistics. But I was never required to take a proper statistics class through a math undergrad degree, and would not have been required for a PhD (UNL).
Link to comment
Share on other sites

I would be really really shocked if there is single a non-accelerated public high school program math program in the country that covers standard deviations and confidence intervals. Certainly it's covered, if I recall correctly there's even a stats AP, and most people would be surprised at how far some programs go given the general impression of Americans and math, but most public programs struggle to get through algebra.

 

Which poses an interesting question because for the average person a basic stats course would probably be more useful.

Link to comment
Share on other sites

1. Covered in first-year Engineering Math (same place we discussed precision (and the cost of precision, and why you don't over-specify precisions when you're planning/purchasing), significant digits, . Covered *again* in Statistics for Engineers in second year, where it came in about the Normal distribution when we were covering all the "usual distributions".

 

Of course, Engineering is neither Math nor Physics, although it uses a lot of both - the emphasis is very different.

 

2. Covered - very well, I might add, which of course, the whole book does, even if the math is missing a couple of zeros due to inflation - in "How to Lie With Statistics", by Huff.

 

Michael.

Link to comment
Share on other sites

Re Math education in the UK

 

A lot of kids in the UK - a country fallaciously perceived (by some) as some educational paradigm, yet is scraping the bottom in european 'League Tables' despite attempts by the Education Minister to massage figures and trump out gaping disparities in the statistics - are leaving school without a basic and 'working' concept of fractions (and its illegitimate 'half-sisters' %ages, decimals, ratios etc) , geometry and basic algebra - never mind more elevated ideas like calculus and advanced trigonometry.

 

Those that perform well are those that have a succouring environment at home and concerned parents who invest their own time (not necessarily money) and effort.

 

I have worked with too many young kids who have fallen so far behind that it becomes a Sisyphean task to get them up to speed to barely pass their GCSE. Half the climb is persuading them of the value of having an education - however futile and unaccommodating it may appear to them AT THE TIME. Once this attitude is instilled, they become receptive to learning and achieve beyond their, and of others more judgemental, expectations.

 

It is a shame that at least here education is becoming undervalued and even sneered at, and this attitude is becoming more and more acceptable.

Link to comment
Share on other sites

:P Backward run the sentences until reel the mind. In statistics everything is stated in a sort of backward fashion. The statement you give says:

 

For A: There are 19 chances out of 20 (assuming the commonly used 95% confidence interval) that candidate A had between 22 and 30% of the vote at the time the survey was taken. There is one chance in 40 that he had more than 30% of the vote and one chance in 40 he had less than 22%.

 

For B: There are 19 chances out of 20 that B had between 21 and 29% of the vote. One chance in 40 he had more than 29% and a similar probability he had less than 21%.

 

We are assuming the survey was done correctly with all the statistician's assumptions being met (most surveys, almost all in fact, aren't so pristine, and they fall short at least to some degree). Many political surveys stray so far from the necessary assumptions, that their plus or minus so-and-so statements are pretty much worthless.

 

That's it folks. That's all. There ain't no more.

Link to comment
Share on other sites

For A: There are 19 chances out of 20 (assuming the commonly used 95% confidence interval) that candidate A had between 22 and 30% of the vote at the time the survey was taken. There is one chance in 40 that he had more than 30% of the vote and one chance in 40 he had less than 22%.

I'm a bit rusty in this stuff, but isn't it NOT (necessarily) true that, given a 95% confidence interval, there is a 2.5% chance the true answer lies below and a 2.5% chance the true answer lies above?

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...