Comparing butler scores to score against minimax

bluecalm · January 12, 2010

I think that comparing scores of given partnership to minimax scores is better measure of how well this pair play than butler scores (although probably it has more variance). Do you know if anybody did statistical work of comparing top parterships from top events using scores against minimax as the measure ?

I am very interested to see if butler winners would be on the top as well and how big the difference would be.

Free · January 12, 2010

With "minimax scores", do you mean "double dummy results"?

bluecalm · January 12, 2010

do you mean "double dummy results"?

Yes.

Gerardo · January 12, 2010

BBO uses Cross IMPs in every place it uses IMPs. In no place Butler is used.

Jeroen71 · January 13, 2010

I think that comparing scores of given partnership to minimax scores is better measure of how well this pair play than butler scores (although probably it has more variance). Do you know if anybody did statistical work of comparing top parterships from top events using scores against minimax as the measure ?

...

Before you can start doing some statistical analysis, you would have to define what it means for one measure to be better than another.

Not a trivial task....

bluecalm · January 13, 2010

Before you can start doing some statistical analysis, you would have to define what it means for one measure to be better than another.
Not a trivial task....

That's what I am suggesting: using scores at the table and compare them to minimaxes. The pair who beats minimax by more points/imps than other pair in the long is the better pair.

Of course this measure can have a lot of variance and you need to play a lot of hands for it be reliable. I have no idea how many. It would be nice to see how for example Lauria-Versace fares against Meckwell using this measure.

helene_t · January 13, 2010

I think that comparing scores of given partnership to minimax scores is better measure of how well this pair play than butler scores (although probably it has more variance). Do you know if anybody did statistical work of comparing top parterships from top events using scores against minimax as the measure ?

...
Before you can start doing some statistical analysis, you would have to define what it means for one measure to be better than another.
Not a trivial task....

Not so difficult. Well, not so difficult to define it. Maybe more difficult to argue that it's the most reasonable definition :)

Here is what I would do: The score when board b is played by pair NSi aginst EWi is modeled as something like

( strength[NSi] - strength [EWi] ) * a + eps[i,b]

where a[t] is log-normal distributed across tables and eps[i,b] is normal distributed with variance sigma^2.

The best scoring is the one that leads to the lowest estimate of

E(sigma[...]^2) / ( var(strength[...]) * E(a[...])

Jeroen71 · January 14, 2010

Not so difficult. Well, not so difficult to define it. Maybe more difficult to argue that it's the most reasonable definition :)

Here is what I would do: The score when board b is played by pair NSi aginst EWi is modeled as something like
( strength[NSi] - strength [EWi] ) * a + eps[i,b]
where a[t] is log-normal distributed across tables and eps[i,b] is normal distributed with variance sigma^2.

The best scoring is the one that leads to the lowest estimate of
E(sigma[...]^2) / ( var(strength[...]) * E(a[...])

Would you care to elaborate a bit on your choice of model + criterion?

helene_t · January 14, 2010

The criterion is that the variance in scores that is due to strength difference between pairs must be as large as possible compared to the random variance in scores.

The model says that the score on a particular board for two particular pairs playing that board is normal distributed with an expectation which is proportional* to the strength difference between the two pairs. Cascade, Gerben and I have all used that model for different IMP data sets and found that it fits well. Of course a radically different scoring could have different statistical characteristics, but as long as it is some kind of IMP scoring we are probably OK.

*I have assumed that the proportionality factor is log-normal distributed across boards. I felt that that is more realistic than assuming that it is constant. But I have no evidence for any particular distribution of this factor. Maybe Gerben has an informed opinion about it.

I have not provided any model for the distribution of sigma[...] across boards. It may be useful to model it, especially for long tournaments with small fields. Usually an inverse-gamma model is used for such purposes but again I have no evidence in support of any particular model. Again, maybe Gerben can say more.

paulg · January 14, 2010

Butlers and minimax for the first Camrose weekend

          Butler  Minimax  Butler Minimax 
                             Position
Pair 1       1.39     1.10     1     2
Pair 2       1.14     1.15     2     1
Pair 3       1.03     1.01     3     3
Pair 4       0.72     0.02     4     10
Pair 5       0.25     0.55     5     5
Pair 6       0.24     0.09     6     9
Pair 7       0.15     0.24     7     7
Pair 8       0.08    -0.08     8     12
Pair 9       0.07    -0.70     9     16
Pair 10      0.03     0.17     10     8
Pair 11     -0.16     0.71     11     4
Pair 12     -0.18    -0.09     12     13
Pair 13     -0.32    -0.50     13     15
Pair 14     -0.39     0.00     14     11
Pair 15     -0.55    -1.07     15     17
Pair 16     -0.62     0.31     16     6
Pair 17     -0.90    -1.55     17     19
Pair 18     -0.90    -0.33     18     14
Pair 19     -1.76    -1.42     19     18

Paul

Fluffy · January 14, 2010

I don't understand this very well, if minimax is double dummy, if you make 7NT on 2 deep fineses and a drop are you giving every pair who gets those cards -17 or something?

helene_t · January 14, 2010

I don't understand this very well, if minimax is double dummy, if you make 7NT on 2 deep fineses and a drop are you giving every pair who gets those cards -17 or something?

Exactly. Why do you say you don't understand it very well? :)

bluecalm · January 14, 2010

Butler Minimax Butler Minimax
   Position
Pair 1    1.39    1.10    1    2
Pair 2    1.14    1.15    2    1
Pair 3    1.03    1.01    3    3
Pair 4    0.72    0.02    4    10
Pair 5    0.25    0.55    5    5
Pair 6    0.24    0.09    6    9
Pair 7    0.15    0.24    7    7
Pair 8    0.08 -0.08    8    12
Pair 9    0.07 -0.70    9    16
Pair 10 0.03    0.17    10    8
Pair 11    -0.16    0.71    11    4
Pair 12    -0.18 -0.09    12    13
Pair 13    -0.32 -0.50    13    15
Pair 14    -0.39    0.00    14    11
Pair 15    -0.55 -1.07    15    17
Pair 16    -0.62    0.31    16    6
Pair 17    -0.90 -1.55    17    19
Pair 18    -0.90 -0.33    18    14
Pair 19    -1.76 -1.42    19    18

Looks encouraging (for me:) ). Would it be possible for you to provide script/tools (even if they are commercial). I would love to check some matchups like : Meckwell versus Lauria-Versace etc. Looks like minimax is quite reasonable from that sample and good thing about it is that you don't need any other pairs to see who played better (just a lot of hands).

I don't understand this very well, if minimax is double dummy, if you make 7NT on 2 deep fineses and a drop are you giving every pair who gets those cards -17 or something?

Yes :). It seems unjust but...

The most obvious measure of who is better at bridge is to just choose :

score = total points won.

Unfortunately this way you would need thousands (ten of thousands ?) of hands to have something you can rely on.

Here comes modern scoring. Instead of just counting total points won you compare total points won to what other people won on this board. This is much more valuable but still you need many hands to see who is better. Butler scoring is based on this idea.

Unfortunately when major tournaments goes to playoff stage butler is no longer reliable because you don't have enough scores from other tables.

My idea is that maybe comparing to minimax isn't that far away from butler scores and thus can be used as reliable measure of who plays better if you don't have any other (or not enough) scores to compare.

Of course there are "unjust" deals if you use this measure but there are also unjust deals if you play for total points. Even at imps there are many unjust deals (if one pair plays your 7NT they get +17imps, and other pair gets - 17imps).

It's all about variance (how many hands you need to get close to expected value).

Both total points and comparing against minimax are objective (pair with better expected value in both of them is better at bridge) and pair who scores better in the long run is just the better pair.

My hope is that variance in "compare to minimax" is much less than in "count total points won".

Fluffy · January 14, 2010

No, the pair that plays 7NT goes 2 downs because they won't see the that ♦Jxxx on LHO and ♣Q10xxx on RHO. Althou they will gues by force ♥KJxx.

No biggie, they will lose 19 when the others lose 17 or so.

BTW, for years I got many help from BBF posters to improve my english, I am glad for the first time to be the one who helps :), it is said "unfair", unjust has no menaing in english.

bluecalm · January 14, 2010

I am glad for the first time to be the one who helps , it is said "unfair", unjust has no menaing in english.

Thanks much :) English is not much native language. I have just learnt it from the Internet :)

No biggie, they will lose 19 when the others lose 17 or so.

Well.. what about a hand where someone bids hopeless game and win on 3 finesses ? +13, other pair -13. Unfair !

What about grand slam on pure guess of a queen. You guess, +14. Other table played reasonable 6NT and got -14 (or whatever). Unfair !

There many examples of that I am sure you saw more of them than me :)

In the long run we hope luck will all even out... (it won't but it will matter less and less with many hands played although it will be on average bigger in absolute terms). This is our hope with both duplicate scoring and scoring against minimax...

Fluffy · January 14, 2010

I agree with you, but my point is, you randomice a lot by who your opponents are and how they play, you pay a big pice for them becoming inspired against you.

Bad contracts that make, and contracts that make on a finese randomice even more.

If the opponents play well there are times where you will make a decision that will lead to a game that is from 20% to 90% to make. The fact that you landed on the 20% game and made it didn't mean you were lucky, you had made a good decision that might land you on a bad contract before, but maybe a decision nobody else had to face.

There are enough randomice factors around, but my point is: if you add to these on wich positions you sit on certain deals, you are randomicing the results even more.

paulg · January 14, 2010

I use the BBO records plugged into Double Dummy Solver from Bridge Captain to generate the minimax score.

The rest is a manual input into a complex spreadsheet.

Paul

2.5k · January 15, 2010

BTW, for years I got many help from BBF posters to improve my english, I am glad for the first time to be the one who helps :P, it is said "unfair", unjust has no menaing in english.

Fluffy's English is way better than my Spanish, but anyway:

http://dictionary.reference.com/browse/unjust

:)

Fluffy · January 15, 2010

*****

helene_t · January 15, 2010

The Australian youth selection, now on vugraph, uses a, external datum score (which happens to come from real life bridge rather than DD).

I think using an external datum score, whether based on DD, robot play or some large high-level tournament, has some advantages:

- When organizing an event for weak players, to reduce the randomness of the datum score.

- When playing a very small pairs tourney, same argument.

- Maybe (I haven't thought this through) a kind of Swiss movement. The travelers for the boards that remains on table 7 could say something like "Datum is 420 NS. Winners go to 5 NS, losers to 9 EW". Maybe it would be possible to construct something similar to Swiss which would allow for faster movements because you don't need to enter the results in the computer and communicate the results from the computer. And you can move as soon as the round finishes even if some late tables haven't entered their results yet.

OK, the new bridgemates can tell you where to move to in a Swiss movement so the last argument is becoming obsolete.

Anyway, I would prefer robot play to DD. Jack has an option for simulating frequency tables by letting it play a board with a range of systems and styles.

hotShot · January 15, 2010

The IMP scale is not linear.

Compare the range for

1 IMP 20-40,

11 IMPs 500-590 or

21 IMPs 2500-2990

So any type of IMP scoring is sensible to those scores close to the datum score. Simulating frequency tables will generate a noise in those scores close to the datum score, while DD results will produce a systematic bias.

I prefer a systematic bias to random noise.

Jeroen71 · January 15, 2010

The IMP scale is not linear.
Compare the range for
1 IMP 20-40,
11 IMPs 500-590 or
21 IMPs 2500-2990

So any type of IMP scoring is sensible to those scores close to the datum score. Simulating frequency tables will generate a noise in those scores close to the datum score, while DD results will produce a systematic bias.

I prefer a systematic bias to random noise.

DD results do not produce a systematic bias, for most reasonable definitions of systematic bias. It as just as random as the "noise" from the simulated frequency tables.

If my opps bid a ridiculous 7NT that happens to make on 3 finesses and a squeeze, I shrug and move on to the next table. The other pairs in the room are quite likely to have more normal results. However, the DD result is 7NT=, so now instead of, at most, one pair (+their opps) getting a ridiculous score, now everybody in the room will get a ridiculous score. It's not difficult to imagine that all of the pairs that happen to have 7NT on will have a near-zero chance to win the tournament, as they have to make up 34 imps (2 times 17) in perhaps as few as 24 boards.

Fluffy · January 15, 2010

In Spain is becoming popular now playing a 20 VP escale with decimals, where 0 IMP difference is 10-10, but 1 IMP difference is 10.68 vs 9.32 for example, 2 IMP difference is 11.12 ETC... every IMP weights (except those over the maximum).

With butler and computers it seems easy to do the same at datums, so that 10 point difference against datum is sometihgn with decimals and 30 is another thing.

hotShot · January 15, 2010

DD results do not produce a systematic bias, for most reasonable definitions of systematic bias. It as just as random as the "noise" from the simulated frequency tables.

If my opps bid a ridiculous 7NT that happens to make on 3 finesses and a squeeze, I shrug and move on to the next table. The other pairs in the room are quite likely to have more normal results. However, the DD result is 7NT=, so now instead of, at most, one pair (+their opps) getting a ridiculous score, now everybody in the room will get a ridiculous score. It's not difficult to imagine that all of the pairs that happen to have 7NT on will have a near-zero chance to win the tournament, as they have to make up 34 imps (2 times 17) in perhaps as few as 24 boards.

Of cause you need to score NS and EW separately as you do in an Mitchell-Movement anyway.

Lets calculate that correctly:

Lets assume NS play a NT (vul.) and the DD-Solver can make 13 tricks on "on 3 finesses and a squeeze".

Every reasonable Human player stops in 3,4,5 NT. Those NS pairs that score 600-690 get -17 IMPs a pair making 3NT+4 get 720 will lose only -16IMPs.

So the score of the NS side is "distorted" by 1IMP, this board is not selective at all.

A lucky pair that bids and makes 6NT, will lose only -12 IMPs and gain a 4 IMP advantage over the other pairs on their axis. A lunatic pair that bids 7NT and fails will lose -20 IMPS 3 more then the reasonable player.

The same applies to the EW side.

helene_t · January 15, 2010

Of cause you need to score NS and EW separately as you do in an Mitchell-Movement anyway.

I don't see why. Say the EW pairs are generally weaker than the NS pairs. The table results will then on average give higher NS scores than DD. The DD scoring takes care of this while a normal Mitchell tourney does not.

Suppose you have a small tourney with only two pairs! Then all you can do is to compare the table result to some external datum score, day DD or Robot or historical data.

Comparing butler scores to score against minimax

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation