Who will stop Italy?

hrothgar · October 8, 2007

Arend, when you say SA had more luck, isn't that just what Richard was already saying with "this suggested that the Italy v SA was an outlier"?

Your point 1. is well taken though.
No, I think Richard was saying the exact opposite. He was saying that Helene's analysis wrongly assumed that the the win by SA vs Italy in the round robin was an outlier (probably implying that the quarter final result may have proven this wrong).

Arend is correct: Italy scoring well against a large number of teams that - in turn - score well against South Africa.

Based on this, Helene's model assumed that Italy was significantly stronger than South Africa (the actual table results where South Africa beat Italy was an outlier)

Sorry if I wasn't clear earlier

Echognome · October 8, 2007

Seems like this is turning into an econometrics debate (or statistics if you prefer more generally, although I feel this type of modeling is probably best described by the former). I wanted to add that Richard's advocacy of Monte Carlo methods will obviously work much better in his hypothetical tournament question, because he is not using any data for his estimation. Helene offered up a linear model of wins and losses based on actual data to predict future outcomes. She might have bootstrapped the errors rather than use normality, but I doubt that would have changed too much, especially given the small sample size. As such, I feel this is an unfair criticism. In regards to transitivity, I think this is just a matter of how you model it. I suggested an alternative method, but by no means do I feel that my alternative was the only "right" way to do it.

As per Phil's concern, normally estimated probabilities are reported along with standard errors. I'm sure Helene's omission was more for the sake of the readers than for her not considering it in the model.

As per South Africa winning, it does not make the model invalid. I think most of us felt that Italy were heavy favorites. The fact that 5% or 10% or 20% events happen should come as no surprise.

awm · October 8, 2007

Perhaps something that's not being modeled is that the variance in matches depends on style and methods in use.

People say that playing "weird systems" is "high variance" but that's not really true. If my opponents are all playing forcing pass, the match will be much lower variance if I also play forcing pass. Playing similar methods/style tends to reduce the number of swings generated by bidding, whereas playing very different methods/style tends to increase the number of such swings. If one team is somehow inferior to another, the odds of the inferior team winning are much higher in a "high variance" match than in a "low variance" match because the swings due to bidding/style tend to be higher magnitude than the slow accumulation of imps by the team that plays/defends better.

Anyways, watching Italy/South Africa I noticed that very different bidding decisions were made on many hands. It seemed that the Italians bid a lot of games the South Africans didn't (failing on more than they made) and that the South Africans preempted on a lot of hands where the Italians didn't. While it's not obvious that one style is "better" than the other, the fact that the auction was almost always different at the two tables would seem to have helped the South Africans' chances, assuming that (as most seem to think) the Italians are generally superior card players.

Echognome · October 8, 2007

Perhaps something that's not being modeled is that the variance in matches depends on style and methods in use.

What are you talking about? It's already modelled in there. It's called the error term.

I mean seriously, how much do you expect to be modelled?

Should we model all of the possible pairings and who has seating rights?

The temperature of the closed vs the open rooms?

Whether one of the players gets laid the night before? (special thanks to JL for this example)

I mean if you can make the case that (1) this is significant and (2) can easily be modelled, that's one thing. But I think that what Helene was trying to accomplish was a simplistic model with readily available data.

awm · October 8, 2007

The point is that it could easily be the case that:

(1) A will virtually always beat B.

(2) B will beat C most of the time.

(3) C has a substantial non-zero chance to beat A.

I think this is a fundamental flaw in any basically transitive model.

Another way to put it is, the result depends on both relative skill and the frequency and magnitude of random swings. This second frequency depends on which two teams are playing...

helene_t · October 9, 2007

My model said that the probability of Ita beating SA in 128 boards without carry-over was ninety-one-point-something percent. In 96 boards with a small negative carry-over it is less than that. So SA's win is not sensational. In fact if all my predictions turned out to be accurate you could have criticized my model for having an inflated error term.

I'm not saying this to argue that my model is perfect. Of course it's not.

But I do think that simple models with well-understood mathematical properties are good places to start. The parameters estimated in my model have interpretations: the coefficient of country X is the number of IMPs country X wins against an average team in 16 boards. The residual SD is the median deviation from my prediction found in real 16-board matches.

Also my assumptions are bridge-vise meaningful, explicit and can be tested. It has been suggested here (Frances) that the quarter-final results may be over-dispersed relative to my predictions. It has been suggested that my linear response function may in fact be convex. These alternative hypotheses can be tested (I hope to find time to do that shortly. On the basis of the first ten rounds the response function was slightly convex but not significantly so. It might have become significant in the meantime, it might not). Suppose the critics of my model turn out to be right. Then we will have learned something interesting about the nature of such a bridge event.

Models are most successful at the moment when one discovers that they don't conform with reality. This is when they really help us learn something.

helene_t · October 9, 2007

Arend, when you say SA had more luck, isn't that just what Richard was already saying with "this suggested that the Italy v SA was an outlier"?

Your point 1. is well taken though.
No, I think Richard was saying the exact opposite. He was saying that Helene's analysis wrongly assumed that the the win by SA vs Italy in the round robin was an outlier (probably implying that the quarter final result may have proven this wrong).

I have to say that in all the quarters I have been watching, the Italians were playing a bit better than the South Africans.

In data analysis, "outliers" are atypical observations that occur too frequently to be considered effects of the normal statistical fluctuations but too infrequent to support a modification of the model. They are typically either removed prior to the analysis, or (better) one uses so-called robust estimators that let the outliers stay in the data set but don't allow them to influence the fitted parameters more than a more typical deviation in the same direction would.

For example if everyone had adequate defense against everyone's methods except that the Italians were helpless agaist SA's overcall structure, it could be that the residuals were generally normal distributed except that SA-Ita was some ten standard deviations off, something that is very unlikely to occur even in a single match in a 231-match RR. (It could also be an outlier if one of the Italians missed the bus on his travel to the match against SA and was replaced by a kibber who couldn't tell the diference between Bridge and Go Fish, but then it should have no implications for the prediction of the QF outcome. However, a model cannot tell such different kinds of outliers appart because outliers are, per definition, not accounted for by the model. Besides, the available data might be the same in both cases).

Whether something weired was going on in Ita-SA in the RR I cannot tell because the result was not unusual. Even the QF was too short to be identifiable as an atypical match. Maybe expert kibbers observed something unusual, or maybe something can be learned from the per-board statistics, but from the result of the match alone there is no reason to suspect that is was an unusual match.

As for my model, it's just a standard linear model, I don't consider anything to be outliers, and from the residual histograms it doesn't look like I should. I believe it would be a very bad idea to do outlier removal in this case. I sometimes do use robust estimators but I doubt that they are called for in the analysis of bridge results, and in any case, with only 231 data points and 22 parameters, the standard errors are huge and it is imperative to use the most efficient estimators.

helene_t · October 9, 2007

I don't know about running monte-carlos to simulate outcomes, but it does strike me that this is more of a graph theory problem than anything (and, those who have their nose in math more than I do, can tell me how much I am wrong there).

Simulations could be useful if there was not direct way of estimating the parameters in the model. Then one can try with different parameter settings until the simulation results become similar to the observations.

Not to confuse with Markov Chain Monte Carlo, in which one simulates a Markov Chain that is known to converge towards the posterior distribution of the parameters.

helene_t · October 9, 2007

Latest news: the additivity assumption holds (at least on average):

http://www.geocities.com/helene_thygesen/residtrend.jpeg

The plot shows crossvalidation expected match results (positive, i.e. stronger team vs weaker team) on the x-axis, and crossvalidation residuals on the y axis).

If the true response function was non-linear (e.g convex), there would have been a trend. In particular, the expected results for extreme-difference matches such as Italy-T&T would have been biased. The p-value for the zero hypothesis that the Pearson correlation is zero, is 0.86.

The data used are per-round IMP scores from the round robin. The crossvalidation type was leave-one-round-out.

This is not to say that bridge results are always additive. Add a team of Gozillas from the local coffee-house and maybe Italy will not, on average, beat them by 33 IMPs more than T&T will beat them. But for the purpose of Bermuda Bowl it seems that additivity is a reasonable assumption.

Sign In

Who will stop Italy?

Who will stop Italy?
51 members have voted

1. Who will stop Italy?

Recommended Posts

hrothgar

Link to comment

Share on other sites

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Echognome

Link to comment

Share on other sites

awm

Link to comment

Share on other sites

Echognome

Link to comment

Share on other sites

awm

Link to comment

Share on other sites

helene_t

Link to comment

Share on other sites

helene_t

Link to comment

Share on other sites

helene_t

Link to comment

Share on other sites

helene_t

Link to comment

Share on other sites

Join the conversation

Browse

Activity

Who will stop Italy?

Who will stop Italy? 51 members have voted

1. Who will stop Italy?

Recommended Posts

Link to comment

Share on other sites

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Who will stop Italy?
51 members have voted