New hand evaluation method

m1cha · July 16, 2016

"Evolin" is cool but I'm afraid it won't stick as a new word. "Evelyn" is more promising. Same word already exists and people would just transfer the meaning and remember it easier. What do you think?

I think that either should work. "Evelyn" seems a bit easier but doesn't carry the other ideas as easily as "Evolin". If your system is good, any name will do. People even remember "Cappelletti" ;) . Though in that case misspellings are frequent, understandably.

That is the same story again - statistics. I load pile of games into the machine and it gives out some coefficients those it thinks are best fit. I just may round them to the nearest whole number, that's all.

No, I disagree here. These coefficients don't come up randomly, they come up for a reason. For example, if you hold AKQ opposite xxx you can expect that those three honors in one hand will cover the three losers in the other hand. But if you hold AKQ opposite x, there is only one loser in this suit to be covered and the other two honors will make a trick only if you have losers in other suits. But sometimes your opponents will take their tricks in those other suits, so your honors become worthless. Or you may have to guess which card will become a loser and you discard the wrong card. This is why AKQ opposite x gets -1 point, it is worth 1/3 trick less than opposite xxx. Even worse opposite a void, you may not be able to access the honors when you need to because you cannot play to them from the other hand. This is why here you get -3 points for AKQ opposite a void. I would never be able to predice the coefficients correctly, of course, but it should usually be possible to predict if they are positive or negative and if they are high or low.

There is always a reason. But I admit that sometimes the things may be so complex that we cannot understand the reason easily.

Now here are my speculation about K against a singleton. We are talking about controls duplication.

Not really. We are talking about the trick-taking probability of certain cards under certain circumstances (that is, opposite short suits). That is what the computer calculates. You are calling them "duplications" for good reasons but the computer doesn't know that word.

I know I appear fussy here but it will become clear in a moment why I am doing this.

First round controls are Ace and void. Second round controls are King and singlet. Practically A against a singleton is not a duplication as A controls first round and singleton - second.

Correct. In other words: The ace covers the singleton and the singleton covers the next round(s). This is why the ace opposite a singleton gets its full value while an ace opposite a void gets -1.

Q against singleton is not a duplication either as Q doesn't control either first or second round but singleton does. While K against a singleton is a duplication by definition as they both control second round. Same story about void. A against void is a duplication while K or Q is not and you see it in the table.

A void does not only control the first round, it also controls all following rounds. That makes the "duplication" thinking difficult for K/Q opposite a void.

Back to that king: 0 opposite a void while -1 opposite a singleton means that the king opposite a void is more likely to take a trick than a king opposite a singleton - and I cannot believe that. I could believe 0 for both and I could believe -1 for both but not the combination. Also +1 for that queen would mean that a queen opposite a void is more likely to generate a trick than a queen opposite more cards, and I cannot believe that either; and I don't think that other people will find that easy to believe.

400k boards is a huge amount of data. I am honestly respecting this. Yet voids are quite rare and you are optimizing many parameters at the same time. I can imagine that these numbers are statistical errors but I cannot be sure. If you want to make sure, you might divide the boards into 4 packages of 100k each, make 4 separate evaluations and compare the coefficients: Do they fluctuate or are they stable? I am aware this is a lot of work. You don't have to do this for me and you don't have to do it fast.

m1cha · July 16, 2016

This is my pain. I couldn't get any freely available source anywhere in the internet. There are plenty tournament results but they do not include each pair score! Damn. I got part from swan games and part from one of the online bridge web site. Even there they are not prepared for download. I had to write my own crawler to scrape screens.
If you find any web source where pair results are at least visible on a page - let me know.

Hm, last month the 53rd European Team Championships took place in Budapest, the results are here:

http://www.eurobridge.org/repository/competitions/16budapest/microsite/Results.htm

and if you click on a round and then on a table you get the individual scores such as here:

http://www.eurobridge.org/repository/competitions/16budapest/microsite/Asp/BoardDetails.asp?qmatchid=34985

This is high-class bridge, definitely good data but probably not sufficient quantity for you, I am afraid, even if you check for earlier years.

Another possibility perhaps if you write to the BBO people, they might give you the accumulated data of these new "Free Daylong tournaments". That is ~ 10000 participants @ 8 boards each EVERY DAY. Worth writing another crawler for it, I guess ;) . Not all of this is high-class bridge though. (Edit: and three of the players are always robots.)

jogs · July 16, 2016

Hi Tim,

On page 3 in the pdf, it says

I can probably estimate the tricks in my own hand... :)

But how to estimate the combined tricks with partners hand?

This is the method I use for estimating tricks.

E(tricks) = Trumps + (HCP-20)/3

Usually by the 2nd or 3th bid of the auction one or both partners will know the partnership combined trumps and the combined HCP +/- 1.

Then if the auction precedes further.

E(tricks) = Trumps + (HCP-20)/3 + SST + SS

Adjust the count from shortness of the side suits and source of tricks(usually from the second suit). I know my(hand's) contribution towards SST and SS. We must find ways to learn partner's contribution to these parameters.

Also it would be nice if it were possible to exchange this information starting from the three level.

Playing a natural 5-card major system(due to the importance of combined trump length, I'm convinced 4-card major systems are inferior) play fit jumps even in uncontested auctions.

Example:

AKxxx Axx x Qxxx // QJxx x xxx AKJxx

West deals. Opponents silent.

1♠ - 3♣

3♥ - 3♠

4♦ - 6♠

3♣ is a fit jump and forcing to game.

3♥ heart ace.

3♠ forcing

4♦ second round control of diamonds, usually a singleton.

tnevolin · July 16, 2016

Hi Tim,

I wrote a program to calculate "Evelyn-points" under the various denominations, and applied to some random hands...

Does this look correct? :)

Deal   1:  AT32.KT.975.Q743   NT => 11   S => 11   H =>  9   D =>  9   C => 12
Deal   2:  J872.9.KQJ4.JT83   NT =>  9   S => 10   H =>  4   D => 11   C => 10
Deal   3:  AJ5.J64.T75.AT76   NT => 11   S => 11   H => 11   D => 10   C => 11
Deal   4:  KJ98.AK.864.Q874   NT => 13   S => 15   H => 12   D => 12   C => 15
Deal   5:  Q82.K42.AQJ8.743   NT => 13   S => 11   H => 11   D => 13   C => 10
Deal   6:  965.Q3.72.T97642   NT =>  3   S =>  3   H =>  3   D =>  2   C =>  9
Deal   7:  KQJ4.943.A4.JT86   NT => 12   S => 14   H => 10   D =>  9   C => 13
Deal   8:  A532.JT632.8.A53   NT =>  9   S => 13   H => 15   D =>  8   C => 12
Deal   9:  A8.KT9.Q4.AK8653   NT => 18   S => 14   H => 16   D => 15   C => 22
Deal  10:  54.64.KT9.QJ7632   NT =>  8   S =>  5   H =>  5   D =>  7   C => 13
Deal  11:  K9.AT7.93.QT8754   NT => 12   S =>  9   H =>  9   D =>  8   C => 16
Deal  12:  875.Q8.KQ52.AJ72   NT => 12   S => 11   H => 11   D => 14   C => 14
Deal  13:  A6.Q.T7642.KT753   NT => 10   S =>  9   H =>  8   D => 14   C => 15
Deal  14:  AT73.A3.J965.K73   NT => 13   S => 14   H => 11   D => 15   C => 13
Deal  15:  T9865.J86.A9.T73   NT =>  5   S =>  9   H =>  7   D =>  5   C =>  6
Deal  16:  A74.K32.KJ2.AJ97   NT => 16   S => 15   H => 16   D => 16   C => 17
Deal  17:  73.JT5.AQT843.J3   NT => 10   S =>  6   H =>  8   D => 14   C =>  7
Deal  18:  5.KJ96.AQT2.J742   NT => 12   S =>  8   H => 14   D => 14   C => 14
Deal  19:  A8632.9432.632.8   NT =>  4   S => 10   H =>  9   D =>  8   C =>  4
Deal  20:  K7653.5.T65.A873   NT =>  7   S => 13   H =>  6   D => 10   C => 11

(Voids:)
Deal  94:  Q963.KQ98.KJT54.   NT => 12   S => 15   H => 15   D => 17   C =>  7
Deal 104:  AQ8532..Q987.Q94   NT => 12   S => 18   H =>  6   D => 14   C => 12
Deal 183:  QJ86.J87642..A96   NT =>  9   S => 13   H => 17   D =>  5   C => 10

(8+card suits:)
Deal 539:  AKJT9832.A832.A.   NT => 22   S => 31   H => 22   D => 14   C => 13
Deal 759:  AQJ98763.JT..KJ2   NT => 18   S => 24   H => 11   D =>  7   C => 13
Deal 774:  .87.AQJ87542.T52   NT => 13   S =>  4   H =>  7   D => 21   C =>  9
Deal 816:  A9.A2.AKQT8765.J   NT => 24   S => 17   H => 17   D => 30   C => 16
Deal 861:  QT72.6.KQJT9864.   NT => 15   S => 14   H =>  5   D => 22   C =>  4

Great idea. You doesn't cease to amaze me. Why didn't I think about it myself? I checked first three lines and they are correct. I think I'll follow your idea and create a complimentary Java program to distribute along with the document. Probably add some description as how each value is calculated.

...

Also checked first line from void and 8+ cards. Void is correct but for 8+ cards you probably can count this rule "High card combinations in side suit with 8+ cards on line (out of 3 top cards)" too because you have known 8 cards on line.

Stefan_O · July 16, 2016

for 8+ cards you probably can count this rule "High card combinations in side suit with 8+ cards on line (out of 3 top cards)" too because you have known 8 cards on line.

Yes, well spotted... :)

I didnt implement the "combination-rules", and missed that one...

tnevolin · July 16, 2016

@m1cha

Thank you for great post. I'll try to comment on your points one by one.

That is the same story again - statistics. I load pile of games into the machine and it gives out some coefficients those it thinks are best fit. I just may round them to the nearest whole number, that's all.
No, I disagree here. These coefficients don't come up randomly, they come up for a reason. For example, if you hold AKQ opposite xxx you can expect that those three honors in one hand will cover the three losers in the other hand. But if you hold AKQ opposite x, there is only one loser in this suit to be covered and the other two honors will make a trick only if you have losers in other suits. But sometimes your opponents will take their tricks in those other suits, so your honors become worthless. Or you may have to guess which card will become a loser and you discard the wrong card. This is why AKQ opposite x gets -1 point, it is worth 1/3 trick less than opposite xxx. Even worse opposite a void, you may not be able to access the honors when you need to because you cannot play to them from the other hand. This is why here you get -3 points for AKQ opposite a void. I would never be able to predice the coefficients correctly, of course, but it should usually be possible to predict if they are positive or negative and if they are high or low.

There is always a reason. But I admit that sometimes the things may be so complex that we cannot understand the reason easily.

Not really. We are talking about the trick-taking probability of certain cards under certain circumstances (that is, opposite short suits). That is what the computer calculates. You are calling them "duplications" for good reasons but the computer doesn't know that word.
I know I appear fussy here but it will become clear in a moment why I am doing this.

Correct. In other words: The ace covers the singleton and the singleton covers the next round(s). This is why the ace opposite a singleton gets its full value while an ace opposite a void gets -1.

A void does not only control the first round, it also controls all following rounds. That makes the "duplication" thinking difficult for K/Q opposite a void.

Back to that king: 0 opposite a void while -1 opposite a singleton means that the king opposite a void is more likely to take a trick than a king opposite a singleton - and I cannot believe that. I could believe 0 for both and I could believe -1 for both but not the combination. Also +1 for that queen would mean that a queen opposite a void is more likely to generate a trick than a queen opposite more cards, and I cannot believe that either; and I don't think that other people will find that easy to believe.

What I meant about coefficients is that there are different ways to get them. Computer and people solve same task of calculating coefficients (feature values). They do it differently. Sometimes they converge on coefficient and people are happy to see two different approaches match in the end. That is all to it. We cannot actually judge the way computer think if coefficients do not converge as good as we would like. Our attempt to "explain" it is just a rationalization of our own model that doesn't actually prove that we are right. The only way to judge it is to do it 10 different ways and if 9 of them converge but 1 stands out then we can presume these 9 are correct. When you have only two it's inconclusive.

I agree with you that we are estimating tricks not controls. I was just using term control trying to "explain" these results in bridge terms. This explanation is merely a mind game or speculation that doesn't actually prove these values are correct or not.

Now, if we continue our speculative mind game :), keep in mind that "High card combinations in side suit with 8+ cards on line" and "Value duplication" feature are corrective ones. You can see "Optional. Count only if known." note for each of them. That means that even if you do not count them due to lack of partner's hand knowledge the result still be correct. These two coefficients allow you to do finer tuning in case you have information to use them. That's why they go to both positive and negative sides.

Back to the K-x. When we are analyzing the value of "Value duplication" coefficient we need to remember that there are other features at play. You and your partner are separately counted trick taking potential of king and singleton, correspondingly. If you don't know each other hand - that's fine. However, if you do know that your king and his singleton are occurred in the same suit, you can use this knowledge to further fine tune the result by applying "Value duplication" rule and see what difference does it make. So K-x = -1 doesn't say anything about king trick taking potential. It says that king trick taking potential and singleton trick taking potential clash and the result of this clash is that the combined trick taking potential of kind and singleton when they are in the same suit is 1 point less than if they were in different suits. I agree with you that this is a very complex dependency to grasp. That's why I used term of controls to try to explain (not to prove) this result. Think of this as a mnemonic rule that helps to remember this irregularity.

400k boards is a huge amount of data. I am honestly respecting this. Yet voids are quite rare and you are optimizing many parameters at the same time. I can imagine that these numbers are statistical errors but I cannot be sure. If you want to make sure, you might divide the boards into 4 packages of 100k each, make 4 separate evaluations and compare the coefficients: Do they fluctuate or are they stable? I am aware this is a lot of work. You don't have to do this for me and you don't have to do it fast.

You are right that some features occur more often than others. That's why I explicitly excluded unstable coefficient with insufficient statistics. Those included in the document are reliable!

In numbers, I excluded feature those occur less than 100-200 times overall. With 100 results statistical error for corresponding coefficient is about 10%. So if its numeric value is less than 5 then absolute error is less than 0.5 which is OK. This is only for very very rare features. I can tell you void is not rare. 10 cards suit is rare. :)

tnevolin · July 16, 2016

Hm, last month the 53rd European Team Championships took place in Budapest, the results are here:
http://www.eurobridge.org/repository/competitions/16budapest/microsite/Results.htm

and if you click on a round and then on a table you get the individual scores such as here:
http://www.eurobridge.org/repository/competitions/16budapest/microsite/Asp/BoardDetails.asp?qmatchid=34985

This is high-class bridge, definitely good data but probably not sufficient quantity for you, I am afraid, even if you check for earlier years.

Another possibility perhaps if you write to the BBO people, they might give you the accumulated data of these new "Free Daylong tournaments". That is ~ 10000 participants @ 8 boards each EVERY DAY. Worth writing another crawler for it, I guess ;) . Not all of this is high-class bridge though. (Edit: and three of the players are always robots.)

Thanks, m1cha.

First link is useful. I can see both deal and pair results on the same page. Will see how much can I scrape from there.

BBO robots are useless, unfortunately. They are not playing like humans.

tnevolin · July 16, 2016

This is the method I use for estimating tricks.

E(tricks) = Trumps + (HCP-20)/3

Usually by the 2nd or 3th bid of the auction one or both partners will know the partnership combined trumps and the combined HCP +/- 1.

Then if the auction precedes further.

E(tricks) = Trumps + (HCP-20)/3 + SST + SS

Adjust the count from shortness of the side suits and source of tricks(usually from the second suit). I know my(hand's) contribution towards SST and SS. We must find ways to learn partner's contribution to these parameters.
Also it would be nice if it were possible to exchange this information starting from the three level.

Playing a natural 5-card major system(due to the importance of combined trump length, I'm convinced 4-card major systems are inferior) play fit jumps even in uncontested auctions.

Example:

AKxxx Axx x Qxxx // QJxx x xxx AKJxx

West deals. Opponents silent.

1♠ - 3♣
3♥ - 3♠
4♦ - 6♠

3♣ is a fit jump and forcing to game.
3♥ heart ace.
3♠ forcing
4♦ second round control of diamonds, usually a singleton.

There are two different questions mixed in our conversation.

1. Evaluate your hand strength (in tricks or points).

a. By counting number of tricks you can take in you hand. You also can multiply it by 3 to express hand strength in points to standardize hand strength communications in bidding if needed.

b. By using evaluation method (summarizing all feature values).

2. Estimating combined partnership trick taking potential.

a. Each partner evaluate their hand and then you add two numbers. That happens in normal bidding.

b. One player tries to estimate partner's strength by observing bidding and assigning some average strength to partner. Used in blocks and strong openings to weigh the risk.

What I meant by "if you have 10+ cards in a suit ..." is that with extreme distribution you will get more precise evaluation by counting tricks in your hand directly and then multiplying by 3 rather than using evaluation method. Evaluation method helps you when you are not sure how many tricks you can take - when your values are all different type and scattered across suits, etc. When they are all concentrated in a single suit it is no brainier. You abandon complex evaluation method and switch back to trick counting. Then you continue bidding as usual to understand combining strength. Your partner may use evaluation model or something else depending on your agreements.

In short, my model is about 1-b that for extreme hands can be replaced with 1-a on player discretion. Then it ends up with 2-a. It has nothing to do with 2-b!

tnevolin · July 16, 2016

There is one concept common for all valuation methods. I didn't include it into document description as I thought it is quite obvious. Apparently I see many of you keep asking it over and over again in various forms. Let me try to clarify this confusion.

There are two ways to understand trick taking potential of your pair.

1. Have X-ray vision and just plainly count all tricks by solving double dummy problem.

2. Use some sort of valuation method, pass some encoded information to your partner, calculate combined strength, and evaluate critical game prospectus by some rule.

It goes without saying that first way is superior. It is much easier and is absolutely precise. Second way just sucks in comparison. Therefore, if you possess X-ray vision, by all means use it. However, sometimes dark forces cloud your vision of other hands and your hand alone doesn't give you a clue (99.99% of hands). Then evaluation system comes to the rescue. It is complex, cumbersome, and not 100% accurate but doable and better than educated guessing. Bridge battle is a battle of evaluation methods, bidding systems, and card play. Enhancing one of the components give you edge over others.

So please please don't ask me why this suit takes 10 tricks in NT while my model gives it only 21 points.

AKQJTxxxxx

jogs · July 16, 2016

There are two different questions mixed in our conversation.
1. Evaluate your hand strength (in tricks or points).
a. By counting number of tricks you can take in you hand. You also can multiply it by 3 to express hand strength in points to standardize hand strength communications in bidding if needed.
b. By using evaluation method (summarizing all feature values).
2. Estimating combined partnership trick taking potential.
a. Each partner evaluate their hand and then you add two numbers. That happens in normal bidding.
b. One player tries to estimate partner's strength by observing bidding and assigning some average strength to partner. Used in blocks and strong openings to weigh the risk.

By summing partner's trumps with my own and summing partner's HCP with my own I can get a first estimate partnership trick taking potential. These are the two general case parameters. SST and source of tricks are parameters specific to the current board.

----

Also by estimating tricks AKQJTxxxxx is 10 tricks. Points are artificial. Tricks are real.

E(tricks) = Trumps + (HCP-20)/3

Notice that my trick estimate depends on which suit is trumps.

Stefan_O · July 16, 2016

So please please don't ask me why this suit takes 10 tricks in NT while my model gives it only 21 points.
AKQJTxxxxx

Well... since it is the opponents who lead to trick one, this suit might also take 0 tricks in NT?

Perhaps, the long-term average actually is 21/3 = 7 tricks for such suit? :)

jogs · July 16, 2016

No, I disagree here. These coefficients don't come up randomly, they come up for a reason. For example, if you hold AKQ opposite xxx you can expect that those three honors in one hand will cover the three losers in the other hand. But if you hold AKQ opposite x, there is only one loser in this suit to be covered and the other two honors will make a trick only if you have losers in other suits. But sometimes your opponents will take their tricks in those other suits, so your honors become worthless. Or you may have to guess which card will become a loser and you discard the wrong card. This is why AKQ opposite x gets -1 point, it is worth 1/3 trick less than opposite xxx. Even worse opposite a void, you may not be able to access the honors when you need to because you cannot play to them from the other hand. This is why here you get -3 points for AKQ opposite a void.

This is why one needs a system to count both winners and losers. Need to count those losers in high level auctions carefully. Too complex to count losers in low level auctions.

400k boards is a huge amount of data. I am honestly respecting this.

400K boards sounds like a huge amount of data to study. But is often insufficient amount of data to learn anything useful. Create any situation. Usually one is lucky if one out of 100 boards are useful for studying that situation.

m1cha · July 17, 2016

What I meant about coefficients is that there are different ways to get them. Computer and people solve same task of calculating coefficients (feature values). They do it differently. Sometimes they converge on coefficient and people are happy to see two different approaches match in the end. That is all to it. We cannot actually judge the way computer think if coefficients do not converge as good as we would like. Our attempt to "explain" it is just a rationalization of our own model that doesn't actually prove that we are right.

This is true but these two models are not independent. I mean if you analyse a situation and you think you understand it, and then a computer analyses the situation by means of a complex model, you expect it to get a similar result, at least qualitatively, right?

I believe your model is somewhat similar to what climate scientitst do to understand the warming of the earth. Use a formula or a set of formulas and determine a multitude of coefficients from a huge set of data. So if simple physics tells us that the temperature should go up when the CO₂ content increases, and if the model would tell us that the temperature should go down instead, that would be quite spectacular. On the other hand, the influence of water vapor is too complicated to treat it with simple physics because you have opposing effects, so that's where you need the model. But I believe most of the situations in your analysis are more simple than the water vapor example. Although some might not be.

Now, if we continue our speculative mind game :), keep in mind that "High card combinations in side suit with 8+ cards on line" and "Value duplication" feature are corrective ones. You can see "Optional. Count only if known." note for each of them. That means that even if you do not count them due to lack of partner's hand knowledge the result still be correct. These two coefficients allow you to do finer tuning in case you have information to use them. That's why they go to both positive and negative sides.

Yes yes. But still, when one of these coefficients is positive, you expect a higher probability of making more tricks.

So K-x = -1 doesn't say anything about king trick taking potential. It says that king trick taking potential and singleton trick taking potential clash and the result of this clash is that the combined trick taking potential of kind and singleton when they are in the same suit is 1 point less than if they were in different suits.

This is certainly a precise formulation but it does not help to understand the problem of these figures. The problem is this:

You have a hand with an average side suit, you add a king in the opposite hand in this suit, and the combined value of the hand rises by 2 points equivalent of the probability of making 2/3 of a trick.

You have a hand with a singleton in a side suit, you add a king in the opposite hand in this suit, and the combined value of the hand rises by 1 point equivalent of the probability of making 1/3 of a trick.

You have a hand with a void in a side suit, you add a king in the opposite hand in this suit, and the combined value of the hand rises by 2 points equivalent of the probability of making 2/3 of a trick.

This is strange. I can't prove it wrong, but I find it strange.

You are right that some features occur more often than others. That's why I explicitly excluded unstable coefficient with insufficient statistics. Those included in the document are reliable!

I see, OK.

In numbers, I excluded feature those occur less than 100-200 times overall. With 100 results statistical error for corresponding coefficient is about 10%. So if its numeric value is less than 5 then absolute error is less than 0.5 which is OK.

This is true for independent random events of equal weight on a single variable. Is it correct that you kept the other coefficients constant in this part of the test and only determined the value duplication coefficients? In that case I would accept the test. If you did it within a multi-factor analysis, working with several variables within one test, I wonder if you could have additional noise from other sources.

5 points sounds OK but not too much considering that a contract can make an overtrick or be 2 - 3 tricks down depending on play. 0.5 means still 32 % of being wrong. Well okay, that's the borderline figures.

This is only for very very rare features. I can tell you void is not rare. 10 cards suit is rare. :)

I never had a 10-card suit. :)

Well with ~ 400k boards (boards are independent, observations are not), voids have ~ 4.5 %, voids by the playing party in a side suit have ~ 2 %, that is ~ 8000 observations. Voids opposite some definite combination of honors, you are down to ~ 1000 events. That seems to be on the safe side. But I started to understand why you need 400k boards :) .

If your figures are correct, what could that mean? Could it mean that the opponents, when they know declarer is very short in a suit, lead their aces carelessly promoting tricks for declarer?

jogs · July 17, 2016

I thought about it myself a lot. Here is my speculation on it.
Let's take, for example, 25 HCP combined, 4333-4333 distribution on both hands, and 4-4 fit. This is said to be enough for 3NT.

4333 // 4333 or a joint pattern of 8666

I suspect that with 25 HCP 3NT will make less than 40% of the time. You should be able to prove my statement as true or false. Examine the data base of all 25 HCP, 4333--4333 hands, played in 3NT. Make a histogram of tricks made for the entire study. Post the mean tricks made, the standard deviation of those tricks, and the median tricks made. Perform the same study with 24 HCP, 26 HCP and 27 HCP hands. TIA

tnevolin · July 17, 2016

You have a hand with an average side suit, you add a king in the opposite hand in this suit, and the combined value of the hand rises by 2 points equivalent of the probability of making 2/3 of a trick.
You have a hand with a singleton in a side suit, you add a king in the opposite hand in this suit, and the combined value of the hand rises by 1 point equivalent of the probability of making 1/3 of a trick.
You have a hand with a void in a side suit, you add a king in the opposite hand in this suit, and the combined value of the hand rises by 2 points equivalent of the probability of making 2/3 of a trick.
This is strange. I can't prove it wrong, but I find it strange.

Believe me I feel exactly the same. Some strange fluctuation that I cannot explain. I even highlighted this irregularity in the document.

I kept it there because other parts of the model seems reasonable. I also varied input parameters and features like 500 times and approved for final version only those displaying some persistence to count them as proven effects and not the noise. So I tend to believe this is some sort of new rule that got discovered during experimentation. Spectacular, as you said. :)

This is true for independent random events of equal weight on a single variable. Is it correct that you kept the other coefficients constant in this part of the test and only determined the value duplication coefficients? In that case I would accept the test. If you did it within a multi-factor analysis, working with several variables within one test, I wonder if you could have additional noise from other sources.

5 points sounds OK but not too much considering that a contract can make an overtrick or be 2 - 3 tricks down depending on play. 0.5 means still 32 % of being wrong. Well okay, that's the borderline figures.

I didn't do an extensive probabilistic analysis of errors. Just decided that 200 events should be enough to include feature in the system and anything less than that is my consideration. My explanation is just an illustration of an accurate approach to result interpretation. I didn't mean to prove that it is mathematically correct.

But I started to understand why you need 400k boards :) .

I didn't need them. That is how much my computer can chew. Fortunately, I was able to feel the threshold where further increase in features doesn't significantly add up to the accuracy. So I can say the current version includes more or less optimal balance between number of features and accuracy.

I couldn't analyze slam missing cards conditions completely, though. This is the only thing that requires more data.

m1cha · July 17, 2016

@tnevoliln:

Thank you for your explanations. Here is one maybe final point from my side. We were talking about the possibility of setting the point requirements for a full game to 25/26. I understand your reasons for not wanting to do so. But there is something we have not discussed so far. It is the question which hands are weak, which are strong and which are invitational. For example, opposite a 1M opening 11 - 12 point hands are typically considered invitational. There are many situations. If you set your full game requirement to ~ 26, all point ranges can remain and your point-counting system can easily be intrgrated into almost all standard bidding systems. If you have the full game requirement at 28/29, everything changes. I wonder if people are willing to undergo the trouble of changing all the bidding ranges for testing your system. Indeed I did this once for trying Zar points but except my that-time partner I don't know anyone in my environment who found it worth trying. And Zar points are relatively easy in that respect because all the ranges just multiply with 2.

tnevolin · July 18, 2016

@tnevoliln:

Thank you for your explanations. Here is one maybe final point from my side. We were talking about the possibility of setting the point requirements for a full game to 25/26. I understand your reasons for not wanting to do so. But there is something we have not discussed so far. It is the question which hands are weak, which are strong and which are invitational. For example, opposite a 1M opening 11 - 12 point hands are typically considered invitational. There are many situations. If you set your full game requirement to ~ 26, all point ranges can remain and your point-counting system can easily be intrgrated into almost all standard bidding systems. If you have the full game requirement at 28/29, everything changes. I wonder if people are willing to undergo the trouble of changing all the bidding ranges for testing your system. Indeed I did this once for trying Zar points but except my that-time partner I don't know anyone in my environment who found it worth trying. And Zar points are relatively easy in that respect because all the ranges just multiply with 2.

I agree and this is exactly the point someone made earlier. I can shift some value say trump length up and down to bring game requirements close to 26. That is doable and makes no difference for me. The arithmetic is the same. I leave this for people to decide if they like one way or another.

tnevolin · July 31, 2016

Finalized article about whole method internals. Lot's of charts, description, analysis, and results.

https://drive.google.com/open?id=0BxM2JfK2YtucQWxsZmlUR2ZUblU

jogs · August 1, 2016

Look at the chart on page 11. If you trade stock markets, you would be familiar with Bollinger bands. This chart should show both mean estimates of tricks and the one standard deviation of those estimates. If the std dev is much greater than 1 trick per board, those mean estimates should be taken with a grain of salt.

tnevolin · August 1, 2016

Look at the chart on page 11. If you trade stock markets, you would be familiar with Bollinger bands. This chart should show both mean estimates of tricks and the one standard deviation of those estimates. If the std dev is much greater than 1 trick per board, those mean estimates should be taken with a grain of salt.

The residual quadratic mean for each method and contract type are in table on page 9. Initially I planned charting it on the same graph as on page 11. Then decided that it wouldn't give much more insight that a single average number anyway. The quadratic mean is about the same as standard deviation except it measures deviation from predicted point not from the experimental points population mean. That essentially shows you how far experimental points are from predicted one on average (that is what I wanted to measure) versus how far experimental points are from their own mean.

In the table you can see the deviation is about 0.7-0.8.

jogs · August 1, 2016

Are those charts for NT contracts only? By my count a partnership has 32+ HCP about one out of every 150 boards. Can't find any bridge website which publishes the combined HCP of a partnership. Since most successful slams are in a suit(mostly majors), I'm much more interested in tricks for suit strains.

If one knows both the combined HCP and combined trumps, one can get a better estimate of tricks. But the std dev of those estimates are often greater than 1.25 tricks/board. It often depends on whether there is duplication of values in the short suits.

tnevolin · August 2, 2016

Are those charts for NT contracts only? By my count a partnership has 32+ HCP about one out of every 150 boards. Can't find any bridge website which publishes the combined HCP of a partnership. Since most successful slams are in a suit(mostly majors), I'm much more interested in tricks for suit strains.
If one knows both the combined HCP and combined trumps, one can get a better estimate of tricks. But the std dev of those estimates are often greater than 1.25 tricks/board. It often depends on whether there is duplication of values in the short suits.

Charts are for both NT (N) and trumps (T). See legend.

jogs · August 9, 2016

The residual quadratic mean for each method and contract type are in table on page 9. Initially I planned charting it on the same graph as on page 11. Then decided that it wouldn't give much more insight that a single average number anyway. The quadratic mean is about the same as standard deviation except it measures deviation from predicted point not from the experimental points population mean. That essentially shows you how far experimental points are from predicted one on average (that is what I wanted to measure) versus how far experimental points are from their own mean.
In the table you can see the deviation is about 0.7-0.8.

It's not clear what you did on page 9. Is this residual quadratic mean calculated for each observation separately? Is it the first difference between the observed value and the expected value for each observation? Hope it is not the difference between the observed value and the mean value of the sample.

tnevolin · August 9, 2016

It's not clear what you did on page 9. Is this residual quadratic mean calculated for each observation separately? Is it the first difference between the observed value and the expected value for each observation? Hope it is not the difference between the observed value and the mean value of the sample.

There are two graphs on pages 9 and 10 depicting experimental average depending on theoretical prediction for SAYC and Evolin points, correspondingly. In simpler words, I estimated number of tricks for each hand in observation - that would be the "estimated tricks" (horizontal) scale value. Then for each observation I got real tricks. That is the "actual tricks" (vertical) scale value. Then I averaged results by whole tricks. So the graph point corresponding to 10 estimated tricks represents all hands that were estimated in range 9.5 - 10.5 tricks. And vertical value of this point is the real trick count average for all of these hands.

Now back to residual quadratic mean. Residual is the difference between experimental and theoretical values. The standard measure of match quality is the sum of residual squares. The lesser the value the better the match. This number is good when you run the optimization but is it not good for understanding how much experimental values deviate from theoretical ones on average. For that the <residual quadratic mean> = SQRT(<sum of residual squares> / N) is used.

What I meant in previous post is that I calculated residual quadratic mean across all observations as a single number instead of splitting it to buckets as I did on prediction accuracy graphs. Graphs are for visual perception only. It is nice to see how two lines goes close to each other. Whereas for residual quadratic mean you need just one number which show you an average error. So taking that residual quadratic mean is 0.8, one can draw two supporting lines on the prediction quality graph. One is 0.8 tricks above and one is 0.8 tricks below. Then you can say that 70% of experimental dots would fall into this band. I just didn't want to actually draw them on the graph to not overload it with heavy math.

jogs · August 9, 2016

Our side has 32+ HCP about once every 150 boards. We have a biddable and makeable slam about 3% of the time. HCP is just one of many parameters for generating tricks. Trumps, mainly quantity of trumps and sometimes quality of trumps, is the second parameter. With your large database it is possible to learn which other parameters affect tricks and measure that effect. Lawrence's short suit totals is another parameter. His contribution has been dismissed by most experts. Then there is source of tricks from a second and sometimes a third suit. Controls play a major role in slams. Even opponents' patterns and their ability to defend play a role. But we have no control over those parameters.

New hand evaluation method

Recommended Posts

Link to comment

Share on other sites

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

The_Badger

m1cha

Stefan_O

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites