Finally revealed: why GIB makes 0% plays

smerriman · June 18, 2022

It is extremely well-known and has been stated many times over (by myself included) that GIB works by performing a very basic algorithm:

1) Generate a bunch of deals roughly consistent with the auction.

2) Calculate the double dummy result for each card you can play.

3) Pick the card that averages the best score.

(On top of this, when advanced GIB is declaring, there is a single-dummy algorithm that kicks in after trick 2 that tries to make a plan to avoid putting off guesses. But this is not relevant to defending, or the cases discussed in this post).

There is course a lot of complexity in step 1 - how does it find hands that match the auction? What if the auction is impossible, or so rare it can't be simulated? If it allows for some variation - as has been stated by barmar in the past - how does it know if a deal is 'close' to matching?

But steps 2 and 3 are trivial.

5 years ago gwnn posted a hand where GIB plays a card at trick 10 that is guaranteed to be strictly worse than any other card, if any nonzero number of hands is simulated:

[hv=http://www.bridgebase.com/tools/handviewer.html?sn=gwnn&s=SAT4HK964DAKC9872&wn=Roboter&w=S983HAJT75DJ4CKT5&nn=Roboter&n=SKQJ65HQ82DQ852CA&en=Roboter&e=S72H3DT9763CQJ643&d=s&v=b&b=7&a=1N(notrump%20opener.%20Could%20have%205M.%20--%202-5%20%21C)P2H!(Jacoby%20transfer%20--%205+%20%21S)P2S(Transfer%20completed%20to%20S%20--%202-5%20%21C%3B%202-5%20%21)P3D(New%20suit%20--%204+%20%21D%3B%205+%20%21S%3B%2010+%20total%20points)P3S(Support%203rd%20S.%20No%204th%20D%20--%202-5%20%21C%3B%202-3%20%21)P4C(Cue%20bid%20--%204+%20%21D%3B%205+%20%21S%3B%20%21CA%3B%2013+%20total%20points)P4S(2-5%20%21C%3B%202-3%20%21D%3B%202-5%20%21H%3B%203%20%21S%3B%2017%20HCP%3B%2018)P4N(Blackwood%20%5BS%5D%20--%204+%20%21D%3B%205+%20%21S%3B%20%21CA%3B%2015+%20total%20points)P5H(Two%20or%20five%20key%20cards%3B%20no%20queen%20--%202-5%20%21)D(6+%20HCP%3B%20rebiddable%20%21H%3B%20%21HKQ%3B%2020-%20total%20points)6S(4+%20%21D%3B%205+%20%21S%3B%20%21CA%3B%2015+%20total%20points)PPP&p=S3S5S2S4CAC3C2C5D2D3DAD4DKDJD5D7C7CTS6C6D8D9SAH5STS8SKS7SQC4C8S9DQDTC9H7H2H3HKHJH4HTHQD6H8CJH6HACKSJCQH9]400|300[/hv]

That blew the usual 'maybe you just got a very very unlucky set of sims' excuse out the window.

It has been bugging me every since - to be honest, it's very rare a week has gone past over the last several years where I don't think "gah, I wish I knew what GIB was doing".

While my attempts to get access to the source code have failed, I can finally announce I know why it made (and continues to make) these mistakes.

And that is because, rather shockingly, GIB does not perform step 2 as everyone believes it does.

--

A couple of weeks ago, Lorand Dali posted about his new AI bridge project. Very interesting stuff and worth a read. I was slightly disappointed to find out it was entirely reliant on a having a pre-existing robot - learning to bid from a huge sample of hands generated by the GIB robot.

Until I discovered how he generated the hands - not via online GIB hand records as I had expected, but by piping them into the bridge.exe program that freely comes with the Windows downloadable version of BBO.

Wait, there's a free command line version of GIB?

Yes - though of course, it's a version from 2012, so extremely out of date in terms of the bidding. If you think GIB has bidding flaws now, BBO did an amazing job of improving it from where it started while they were still working on it.

How does this help? Because Matt Ginsberg added some debugging flags, documented in an archived version of his website. While the BBO version appears to have been altered somewhat, with some of the flags not working and some workarounds needed, there's still one available which outputs a trace of all of the simulated hands, how GIB scored them, and how that averaged out to its choice of play.

For example, when trying to decide what to lead to trick 1 in gwnn's example hand, it generates 100 hands (this was another flag, I used 100 but BBO will be even less) and displays them each in this format:

deal 76: 
                   S  A Q 7 5 4 
                   H  9 
                   D  A K 7 2 
                   C  J 7 6 
S  9 8 3                                S  J T 
H  A J T 7 5                            H  K 6 4 3 2 
D  J 4                                  D  Q T 6 5 
C  K T 5                                C  Q 4 
                   S  K 6 2 
                   H  Q 8 
                   D  9 8 3 
                   C  A 9 8 3 2 

West to lead; S trumps
mismatch 32.00
CK: 100 CT: 100 C5: 200 DJ: 300 D4: 300 HA: 200 HJ: 200 H7: 200 H5: 200 S9: 200 S3: 200

The first bunch of deals don't include the 'mismatch' line - the last group has increasing mismatch scores, which is presumably widening the range of hands it considers acceptable in include in the the simulation.

And then at the end, a conclusion that averaged out of the double dummy results over all of the deals:

S3: -24.70 -> 2.45
DJ: -27.70 -> 2.34
S9: -24.70 -> 2.15
D4: -26.70 -> 2.07
HA: -119.20 -> 0.95
HJ: -270.80 -> -0.83
C5: -275.70 -> -0.99
H5: -289.10 -> -1.10
H7: -289.10 -> -1.10
CT: -280.70 -> -1.45
CK: -339.70 -> -2.39
I play S3

I expect the DJ got a slight boost due to signalling, but so far it's all making sense.

There's just one small catch.

Some of the earlier deals in the set have question marks after some of the double dummy results:

deal 0: 
                   S  A Q T 7 2 
                   H  4 2 
                   D  Q 9 3 
                   C  9 7 2 
S  9 8 3                                S  5 
H  A J T 7 5                            H  9 6 3 
D  J 4                                  D  K T 8 7 6 2 
C  K T 5                                C  A J 6 
                   S  K J 6 4 
                   H  K Q 8 
                   D  A 5 
                   C  Q 8 4 3 

West to lead; S trumps
CK: 300? CT: 400? C5: 400? DJ: 400? D4: 400? HA: 300? HJ: 400? H7: 400? H5: 400? S9: 400? S3: 400?

In this case, the first 32 deals have ? after all results, and the remaining 68 have none - though on other occasions, some deals have ? for some play cards and not for other played cards.

And most importantly - some of the scores with ? are incorrect.

Look at what happens when we get to the crucial card at trick 10 in gwnn's case.

The first simulated deal:

deal 0: 
                   S  J 
                   H  Q 8 
                   D  ---
                   C  ---
S  ---                                  S  ---
H  A J T                                H  4 
D  ---                                  D  6 
C  K                                    C  Q 
                   S  ---
                   H  9 6 
                   D  ---
                   C  J 

West to play to H2, H3, HK; S trumps
N/S have taken 9 tricks
HA: -1430? HJ: 100?

The question-marked figures say that playing the heart Ace will allow the slam to make - and the J will cause it to go down!

In fact, the first 44 simulated hands all have the same conclusion.

On deal #44 (0-indexed!), it gets it right for the first time:

deal 44: 
                   S  J 
                   H  Q 8 
                   D  ---
                   C  ---
S  ---                                  S  ---
H  A J T                                H  9 
D  ---                                  D  6 
C  K                                    C  Q 
                   S  ---
                   H  6 4 
                   D  ---
                   C  J 

West to play to H2, H3, HK; S trumps
N/S have taken 9 tricks
mismatch 16.00
HA: 100 HJ: -1430

Note that there is nothing special about this deal that separates it from the others - the exact same hand with East left with holding 9-6-Q appeared several times in the first 44.

On deal 45 it's also correct, but 8 of the next 19 hands it has the incorrect figures, before all others are correct.

So as the final result, on 52 of the 100 hands, it believes ducking is required to beat the contract - when it isn't true once. When it combines 52*100 and 48*-1430, you get 63440 - which it provides as its final output:

HJ: -634.40 -> 0.68
HA: -695.60 -> -0.68
I play HJ

Oops.

I took a second example, posted by bixby a few months ago, where throws away its high card on trick 10 in a no-win, rarely-tie, mostly-lose scenario.

[hv=https://www.bridgebase.com/tools/handviewer.html?lin=st||pn|bixby,~Mwest,~Mnorth,~Meast|md|2SAQ62H53DQTCAJT92,SJ854HDKJ7654C764,SKT9HAQ76DA832C53,S73HKJT9842D9CKQ8|sv|b|rh||ah|Board%204|mb|P|mb|1D|an|Minor%20suit%20opening%20--%203+%20!D;%2011-21%20HCP;%2012-22%20total%20points|mb|2H|an|Aggressive%20weak%20jump%20overcall%20--%206+%20!H;%204-10%20HCP|mb|D|an|Negative%20double%20--%204+%20!S;%207+%20HCP;%208+%20total%20points|mb|P|mb|2N|an|4+%20!D;%203-%20!S;%2011-14%20HCP;%2012+%20total%20points;%20stop%20in%20!H|mb|P|mb|3N|an|5-%20!H;%204-5%20!S;%2014-21%20HCP|mb|P|mb|P|mb|P|pc|S7|pc|S2|pc|SJ|pc|SK|pc|C5|pc|CQ|pc|CA|pc|C7|pc|CJ|pc|C6|pc|C3|pc|CK|pc|H4|pc|H3|pc|D7|pc|H6|pc|ST|pc|S3|pc|S6|pc|S8|pc|DA|pc|D9|pc|DT|pc|D4|pc|S9|pc|H2|pc|SQ|pc|S5|pc|SA|pc|S4|pc|H7|pc|C8|pc|CT|pc|C4|pc|D2|pc|HJ|pc|C9|pc|D6|pc|D3|pc|HK|mc|12|]400|300[/hv]

Is this because it was unlucky and every hand it simulated resulted in the equals case?

On my run, it found the equals case just 5 times - no question marks:

HK: -690 HT: -690

On 19 occasions, it had a definitive value for the heart T, but thought - with a question mark - that throwing the king would get a *better* score

HK: -630? HT: -660

On the other 76, it came up with the right values:

HK: -690 HT: -660

In this case, that was enough to weight it to making the correct play (it chooses 8 among equals after the analysis):

HT: -661.50 -> 0.57
HK: -678.60 -> -0.57
I play H8

But given it's capable of including completely wrong dummy double analysis scores in its calculations, it's no longer surprising with a smaller / different set of hands that the incorrect ones could end up biasing the results enough to play the wrong card.

--

Note that it's quite possible that BBO have improved the play engine of GIB since v21, which is the one tested here, though all reports have been that they haven't touched it other than forcing it to lead an ace against 7NT.

Conclusion: I don't know why GIB's "double dummy" analysis causes it to give correct scores for some cards, and incorrect scores for others. Clearly, this is a deliberate part of the program, due to the fact it it marking potentially wrong figures with a question mark (they're not all incorrect) - not that it is intentionally making mistakes, but I assume it is running some sort of optimization that speeds up the double dummy calculations rather than guaranteeing correct output.

If this is required for some reason, why it does not at least switch to guaranteed correct output at least for later in the play when this should be fast, I also don't know.

But at least I know more than I did.

LBengtsson · June 19, 2022

Thanks for posting, smerriman. This is the best (computer) bit of forensic number-crunching detective work I have read about in a long time. At least we all now know why GIB makes these 0% plays some times.

pescetom · June 19, 2022

Well done, great work.

This should not be particularly difficult to fix, either.

pilowsky · June 19, 2022

This should not be particularly difficult to fix, either.

Completely straightforward I'm sure.

gwnn · June 24, 2022

Wow, this is so cool!

cherdano · June 24, 2022

Amazing!

If BBO were smart, they'd long have tried to hire smerriman. Then again, if smerriman was adequately paid, they couldn't afford him.

Forrest_ · June 24, 2022

I spent a few hours trying to wrap my head around a Ginsberg paper on partition search for DDS and I never really grasped it, but it seems like that might be what is going on here.

Paper

The idea is that you can reduce the search space significantly by consolidating positions. For example, if the last two tricks are going to be won by AK of trumps, then the order everyone else pitches their final two cards in doesn't matter, nor does which final two cards they have come down to - so if you can identify that you've reached that set of positions (and you've already solved it once), you can stop your tree search. The part I never quite got is how you will know when it is safe to say that no decisions matter.

I don't have any evidence to support this, but I haven't seen any claims that the DDS for GiB is broken when evaluating a single deal (e.g. through the BBO history UI). I wonder if the issue is caused by trying to reuse transposition/partition table entries across multiple deals when attempting to simulate the hand.

smerriman · June 25, 2022

I spent a few hours trying to wrap my head around a Ginsberg paper on partition search for DDS and I never really grasped it, but it seems like that might be what is going on here.

I wonder if the issue is caused by trying to reuse transposition/partition table entries across multiple deals when attempting to simulate the hand.

Heh, yes, I've read that paper a number of times :) I would have said that was a possibility, if it weren't for the question marks in the debugging output. If it were that type of bug, it would be unexpected / unpredictable - potentially appearing anywhere throughout the system. Given they're annotated with question marks, it implies the program *knows* they're not 100% valid results.

I suspect it's more likely to be related to this part of the paper:

The second technical point regarding the algorithm itself involves the fact that it needs to run quickly and that it may need to be terminated before the analysis is complete..

.. The algorithm also uses iterative broadening (Ginsberg & Harvey, 1992) to ensure that a low-width answer is available if a high-width search fails to terminate in time. Results from the low- and high-width searches are combined when time expires.

This would make more sense - if it hits some time constraint, it marks searches that haven't finished with a ? as potentially not 100% correct. That's what I assumed they would be - some sort of bounds on the actual double dummy analysis - just not that they'd be so far off it's not funny. GIB also appears to not be programmed to have a fixed time limit per *decision* - but a total time limit per *hand* - which could be why it's still not able to run in time even for trivial end positions. But if that were the case, there's definitely improvements that can be made. (I do think some of GIBs flaws are down to users simply wanting a fast robot.)

If BBO were smart, they'd long have tried to hire smerriman. Then again, if smerriman was adequately paid, they couldn't afford him.

I have actually had a couple of emails with uday in the past, but it hasn't gone anywhere (yet?). I have a job which adequately pays me already; playing with GIB is far too much fun to be considered work!

thepossum · June 25, 2022

Thx very much for the interesting post. Hadn't thought about much about this stuff for a while so enjoyed reading the thread :)

etha · June 26, 2022

You can make lorand's program make its own system. If you start with a blank or almost blank NN and make it bid some hands and then use this as input for the next NN etc repeat. The only slight problem is for some reason I don't understand if the NN is almost non existent you get auctions like 5!s 6!c 7NT dbl redbl. It almost always starts at the the 5 level. Anyway I tried adding a only a few hands and retraining and currently it opens 4 card majors and doesn't really understand 1NT. Anyway I plan to keep adding and training and see if it goes anywhere.

My guess is it will be mostly natural bidding as for all it knows the bidding might stop any minute. I also can't really see how it can evolve into anything complicated as unlike evolving the eye where any small amount of light you can detect helps bidding one half of stayman is not that great if pard responds 3!d meaning its friday or something. I might try fiddling a bit with the basic NN and bid picker to see if I can get it to make lower opening bids in some way but it is probably beyond my ability.

thepossum · July 4, 2022

Call me old fashioned but I can't see the point of trying to teach an algorithm to "learn" Bridge without giving it the basics of the game and how to bid reasonably to start with

pescetom · July 4, 2022

Maybe those interested in discussing Lorand's AI project could open a specific thread?

This one is about GIB and is too important to be hijacked (or to not receive a reply from BBO).

Finally revealed: why GIB makes 0% plays

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation