Improving Hand Evaluation Part 3

MickyB · June 5, 2004

You are correct - I missed this one, so here is the new calc:

if ( TSN // HCP + CTRL
+ 2*( max( 0, L[0][fitCol] + L[1][fitCol] -8) ) // FIT points
+ max( 0, getAbcd("N", "a") -4) // Karpin Points a N
+ max( 0, getAbcd("N", "b") -4) // Karpin Points b N
+ max( 0, getAbcd("S", "a") -4) // Karpin Points a S
+ max( 0, getAbcd("S", "b") -4) // Karpin Points b S
+ dN123 + cN123 + dS123 + cS123 // 1-3-5 for N and S
> 53 ) TSNfit++; // check for Grand

---------------------------------------------------

How come we cross-post "magic" results with no explanation showing:

HCP
HCP+321
HCP+531
Zar
BUMRAP+321
BUMRAP+531
TSP
Binky

What kind of "calculation" was made to "suddenly" put the combo-method WAY above when it manifests 3600 against 5700 on the Standard GIB boards?

And the "score" is 0.21 vs. 0.8, almost 3 TIMES better when it is almost 2 times worse? What's the "magic"?

ZAR

Could you explain that calculation please? Does 'dN123+cN123' really equate to 1-3-5 evaluation? Have you included TSP's addition of one point for having two honours in a suit?

Tysen's results did have an explanation. He said:

ERROR is the average # of tricks there is in difference between how many tricks we think we can take and how many we actually take.

SCORE is an estimation of the IMPs/board we expect to gain against a team that uses a simple HCP+321 evaluation method. It’s a measure of how much payoff there is for using a better evaluation system.

What has this got to do with magic? What was sudden about it? It is his extension of BUMRAP+531, which he has always claimed to be superior to Zar. You are comparing Zar and TSP using different methods, and his method is more sound. Please check that your calculation of TSP is correct, then rerun your simulation on all of GIB's boards, seeing how many games are correctly bid, how many are missed, how many games are correctly stayed out of and how many part-score hands are overbid.

hrothgar · June 5, 2004

*** mikestar wrote: "and 1 for each card over 4 in any suit ...
<

You are correct - I missed this one, so here is the new calc:

if ( TSN // HCP + CTRL
+ 2*( max( 0, L[0][fitCol] + L[1][fitCol] -8) ) // FIT points
+ max( 0, getAbcd("N", "a") -4) // Karpin Points a N
+ max( 0, getAbcd("N", "b") -4) // Karpin Points b N
+ max( 0, getAbcd("S", "a") -4) // Karpin Points a S
+ max( 0, getAbcd("S", "b") -4) // Karpin Points b S
+ dN123 + cN123 + dS123 + cS123 // 1-3-5 for N and S
> 53 ) TSNfit++; // check for Grand

and TSN indeed went above Goren 5-3-1 as you predicted due to the HCP + CTRL.

================Overall Results ============================

GOREN 3-2-1 ( HCP+3-2-1> 36 ) got 1427 contracts
The WTC ( number of tricks > 12) got 1543 contracts
GOREN 5-3-1 ( HCP+5-3-1> 36 ) got 2913 contracts
Fit TSN Points ( fit points >53) got 3616 contracts
Basic Zar Points (no fit points>66) got 3753 contracts
Fit +3 Zar Points(+3 extra trmp>66) got 5729 contracts

So still this combination of HCP + CTRL + FIT + Karpin + 1-3-5 is worse than BOTH the basic Zar Points and the Fit Zar Points.

How come we cross-post "magic" results with no explanation showing:

HCP
HCP+321
HCP+531
Zar
BUMRAP+321
BUMRAP+531
TSP
Binky

What kind of "calculation" was made to "suddenly" put the combo-method WAY above when it manifests 3600 against 5700 on the Standard GIB boards?

And the "score" is 0.21 vs. 0.8, almost 3 TIMES better when it is almost 2 times worse? What's the "magic"?

ZAR

Zar, the "magic" is nothing more than basic statistics.

I very much admire your enthusiasm for your point count method and all of the effort that you are making to promote it. However, to perfectly blunt, I have enormous difficultly taking your work credibly because of your repeated failures to apply or apparantly even understand basic statistical analysis.

I strongly suggest that spend some time learning how Tysen is measuring the accuracy of hand evaluation methods.

Its useless to argue about the relative accuracy of hand evaluation systems until we are able to agree on how this should be measured.

inquiry · June 5, 2004

To Mike,

Neither ZAR nor Tysen correct their data for being off two quick tricks. Having looked at ZAR's hands he is "point-and-shoot", if the total is so many zar points, then he assumes the contract is at that level. Same for Tysen. This is easy enough to program for them guess. Real world, however, we will BID THE HANDS, and we will evaluate if slam is a good idea based not only upon "point count" (whatever point count system we use), but also quick losers. Since i am not a computer programmer, I have to look at the hands, and one thing I usually see is if i am off two cashable aces, and if so, I assume none-of-the systems would end up in grand slam or small slam (subtract the nubmer of aces the opponents hold from 8, and don't bid beyond that level... :-) )

As far as discounting singleton honors, let me paraphrase ZAR, which of the following hands would rather have?

A) ♠xxxxx ♥xxxx ♦K ♣AQJ, or

:unsure: ♠Kxxxx ♥AQJx ♦x ♣xxx

You would probably answer B, but there are plenty of hands your partner could hold, where hand B would be worthless, and hand A would be golden, for instance, which hand above would you like to have if your partners hand was

Partner ♠A ♥void ♦AQxxxx ♣Kxxxxx

To Richard,

The "magic" here is that tysen presents a lot of statistics without presenting the hands. He did publish a small subset of hands on one web page. While I am more than willing to believe that "ERROR" data, I find the "score" component, to be frank, totally unbelievable. This is based upon both my own, albeit, limited research looking manually at a number of hands that are probably in the few hundreds (including for instance of this year's cavendish hands, I will need to post more of those), and Zar's on data published on his website. The difference between ZAR's data and Tysen's is that Zar's is publically available, anyone can look at it, anyone can confirm Zar's conclusions (well anyone who can program computers, or with a lot of time on their hands to wade through them manually).

If one stops and thinks about the GIB database for even a second, and compared Goren to Zar, you will also see that ZAR is much better than Goren at imps. In the data Zar just posted, his method bids 36% more games, and 415% more grand slams than Goren. The slam level was also much better, but I don't have the numbers in front of me.

But here is what ZAR means by "magic". His database of hands is publically available,and his evaluation criteria is shown. I will admit that tysen has published a fraction of his hands, and the data he did publish shows he is able to analyze the hands in a manner that he says he is for points and tricks. But the "score" part is not there. I think this what ZAR means as magic, no set of hands. No doubt Zar would be "happy" to run his metrics on Tysen's database if it was available. One knows that tysen could run his on Zar's if he wanted to, because Zar's in available. But the latest evalation method of Tysen now approaches Zar's in many ways (he counts same for controls, hcp, he is counting both long and short suits, etc) so there is getting to be less difference between them.

Ben

mikestar · June 5, 2004

A suggested methodology for evaluting counting methods.

1. Use an agreed upon database--GIB seems like a good choice.

2. For each hand determine the number of tricks double dummy in the optimum denomination.

3. For each hand determine the point count for the hand and which contract would be bid based on the method's target counts.

4. The hands are divided into classes: partscore, game small slam, grand slam. These should be subdivided into suit and NT classes for separate statistics.

5. Count a hand as a success for the method if the it predicts the proper level, count it as a failure if it does not. Thus a hand on which 12 tricks are taken must be predicted as a small slam--a prediction of partscore, game or grand slam is a failure. (This is Zar's method, I believe).

6. Also translate the point count into a predicted number of tricks for the hand and compute the error--the difference between this prediction and the actual number of tricks for the hand. (This is Tysen's method, I'm quite certain).

Now we have a comparision than can be given a fair degree of trust.

A question for Zar:

You display how many making grand slams (for example) are bid by each method, but where are the figures for each method where a grand slam would be bid but goes down? This is critically important data.

If (hypothetically, I don't have any reason the believe it is true or false) TSP bids fewer making grands than Zar but also stays out of more grands that go down this would be quite important in evaluating the relative merits of the methods.

It may well be that Zar is superior in staying out as you have asserted, but please show me the numbers.

Zar · June 5, 2004

*** hrotgar wrote: “I very much admire your enthusiasm for your point count method and all of the effort that you are making to promote it.

<

I am not promoting anything – I just reply to questions.

I have started 0 threads out of the 15 or so discussing different aspects of Zar Points here on the BBO forum. Neither have I started any thread on any of the other forums where Zar Points are discussed.

> Zar, the "magic" is nothing more than basic statistics.

<

So you are the one that is going to explain to us (since there are no other volunteers) the “statistics” that Zar Points “score” almost 3 times worse – 0.08 vs. 0.21. MikyB started and I thought we finally will have something, but ...

So go ahead – you have my undivided attention:

ZAR

hrothgar · June 5, 2004

So go ahead – you have my undivided attention:

ZAR

I will make the same suggestion that I have several other times.

Start with your database of hands.

Using Zar Points ONLY, making no manual adjustments for hands where you are missing two aces or what not, sort the hands into buckets based on the predicted number of tricks that should be taken.

Bucket 1 = hands where you predict that you take 13 tricks

Bucket 2 = hands where you predict that you will take 12 tricks

Bucket 3 = hands where you predict that you will take 11 tricks

...

Next, use the double dummy solver to determine the number of tricks that should actually be taken and provide summary statistics.

For each bucket, provide the mean and the standard error.

[if you prefer, provide the mean and the standard deviation]

Now, replicate this same procedure for each of the hand evaluation systems

that you are measuriing.

----------

If you prefer, you could invert this technique.

Sort the hands based on the number of tricks that the double dummy engine is able to take and then calcuate the Zar point total for each hand.

Once again, report the mean and the standard error.

-----------

There are arguments in favor of either methods:

Tysen used the first method. If you duplicated using your own database, you should be able to replicate his results. If not, than we know that there is some difference in implementation.

On the other hand, its unclear that people can produce tables that state:

If you hald a combined BUM-RAP count of X, then we expect you to take Y tricks. Indeed, technique 2 is a mechanism to produce just such a table.

Zar · June 5, 2004

*** hrothgar wrote: " will make the same suggestion that I have several other times. Start with your database of hands.

<

It is a sound advice, but I am afraid you are answering a question that was not asked. The kind request was to explain the "STATISTICAL METHOD" that was used to determine the claim which basically sais:

"Here is a method that is 3 times better than anything known to man"

showing that "indeed" its has a "score of 0.21" against a "score of 0.08" whatever that means.

So, can we have SOME KIND of explanation about the way this "achievement" was "scorred"? That was the question that anyone in "the Statistical Camp" :-) tends to avoid.

Or is it enough for you someone to start a thread sayin "Here is method which is STATISTICALLY 3 times better than anyone known to man" and you jump head-first just because you are a "statistical man" too? :-)

ZAR

hrothgar · June 5, 2004

*** hrothgar wrote: " will make the same suggestion that I have several other times. Start with your database of hands.
<

It is a sound advice, but I am afraid you are answering a question that was not asked. The kind request was to explain the "STATISTICAL METHOD" that was used to determine the claim which basically sais:

"Here is a method that is 3 times better than anything known to man"

showing that "indeed" its has a "score of 0.21" against a "score of 0.08" whatever that means.

So, can we have SOME KIND of explanation about the way this "achievement" was "scorred"? That was the question that anyone in "the Statistical Camp" :-) tends to avoid.

Or is it enough for you someone to start a thread sayin "Here is method which is STATISTICALLY 3 times better than anyone known to man" and you jump head-first just because you are a "statistical man" too? :-)

ZAR

Zar, I'm not responsible for posting that set of statistics.

I didn't do the analysis that produced that set of statistics.

I am not going to defend that set of statistics.

The issue that I am raising is one of methodology. The metrics that you are using to evaluate the Zar points aren't valid. As I have stated before, I don't care how "aggressive" a hand evaluation system is. However, I am very interested in how accurate Zar points are. In particular, I want to understand how accurate Zar points are in comparison to BUM-RAP, "Work" HCP, etc.

Tysen identified a statistically valid method to evaluate how accurate different bidding systems are and has reported his results. Furthermore, he has correctly identified some short-comings in the analytical techniques that you are using to evaluate different systems.

From my perspective, the most useful thing that you could do to promote your evaluation system would be to switch over and start using a more accurate set of metrics. Replicate Tysen's methods with your own database and see whether your results match his own.

Until you start using statistically valid techniques to measure relative performance you are wasting enormous amounts of time/effort.

MickyB · June 5, 2004

*** hrothgar wrote: " will make the same suggestion that I have several other times. Start with your database of hands.
<

It is a sound advice, but I am afraid you are answering a question that was not asked. The kind request was to explain the "STATISTICAL METHOD" that was used to determine the claim which basically sais:

"Here is a method that is 3 times better than anything known to man"

showing that "indeed" its has a "score of 0.21" against a "score of 0.08" whatever that means.

So, can we have SOME KIND of explanation about the way this "achievement" was "scorred"? That was the question that anyone in "the Statistical Camp" :-) tends to avoid.

Or is it enough for you someone to start a thread sayin "Here is method which is STATISTICALLY 3 times better than anyone known to man" and you jump head-first just because you are a "statistical man" too? :-)

ZAR

Zar,

Tysen's method was, for each evaluation method:

Compare the predicted number of tricks with the actual number of tricks on each hand. The difference between these two is the error.

Take the mean of all these errors.

This number worked out at 1.07 for HCP+321, 1.05 for Zar, and 1.02 for TSP. In other words, on average, Zar is 0.02 tricks more accurate than HCP+321, and TSP is 0.05 tricks more accurate. Hence the amount of improvement gained from switching from HCP+321 to TSP is 2.5 times as much as the improvement gained from switching from HCP+321 to Zar. The 0.08 and 0.21 are irrelevant really; they were calculated from the 0.02 and 0.05.

While just claiming that one method is better than another doesn't make it true, which are we more likely to believe - a claim based on sound methods or flawed methods? It is quite worrying that you do not consider yourself a "Statistical Man", as creating and comparing evaluation systems is totally based on Statistics!

hrothgar · June 5, 2004

showing that "indeed" its has a "score of 0.21" against a "score of 0.08" whatever that means.

So, can we have SOME KIND of explanation about the way this "achievement" was "scorred"? That was the question that anyone in "the Statistical Camp" :-) tends to avoid.

Or is it enough for you someone to start a thread sayin "Here is method which is STATISTICALLY 3 times better than anyone known to man" and you jump head-first just because you are a "statistical man" too? :-)

ZAR

As I noted earlier, I have no way to evaluate whether or not the statistics that tysen produced are accurate. With this said and done, I found it relatively easy to read Tysen and understand what "Score" measures.

Read Tysen's original post and note the following quote

>SCORE is an estimation of the IMPs/board we expect to gain against a

>team that uses a simple HCP+321 evaluation method. It’s a measure

>of how much payoff there is for using a better evaluation system.

Please note that I have never talked to Tysen about any of this, so I might get this wrong, however, I suspect that Tysen did something like the following:

Take a set of X hands.

Use the total HCPs to assign an appropriate contract.

Use a double dummy engine to calculate the number of tricks that can be taken.

Score the hand.

Next, perform the same analysis using a second metric.

Once again, using this metric to assign a contract. Compare this contract to the number of tricks taken by the double dummy engine and score the hand.

NOW, compare the two scores and calculate the number of IMPs won/lost.

Repeat for X hands and then calculate the average.

The "Score" metric is the expected gain/loss per board.

I'll note in passing that HCP scores a 0.0 against HCP, which is exactly what this methodology would require.

Zar · June 6, 2004

*** mikeb wrote: “It is quite worrying that you do not consider yourself a "Statistical Man", as creating and comparing evaluation systems is totally based on Statistics!”

Thanx for the lesson :-) People learn every day :-)

*** hrothgar wrote: “As I noted earlier, I have no way to evaluate whether or not the statistics that tysen produced are accurate.”

<

You are not alone here, that’s the point.

NOBODY knows anything, yet “that’s the thing!” ... It’s “statistics” we are talking about here, not blah-blah-blah ... Real science ... Don’t you dare to think – it’s whatever I say :-) I say it’s 0.21 vs. 0.08 – almost three times better, period. No more discussions :-)

And what is really amazing, nobody cares – the important thing is the claim.

BTW, I just finished the statistical analysis – it showed that Goren has 0.23- so we are back in square 1. “4-3-2-1, let’s play bridge for fun” :-)

ZAR

hrothgar · June 6, 2004

*** mikeb wrote: “It is quite worrying that you do not consider yourself a "Statistical Man", as creating and comparing evaluation systems is totally based on Statistics!”

Thanx for the lesson :-) People learn every day :-)

*** hrothgar wrote: “As I noted earlier, I have no way to evaluate whether or not the statistics that tysen produced are accurate.”
<

You are not alone here, that’s the point.

NOBODY knows anything, yet “that’s the thing!” ... It’s “statistics” we are talking about here, not blah-blah-blah ... Real science ... Don’t you dare to think – it’s whatever I say :-) I say it’s 0.21 vs. 0.08 – almost three times better, period. No more discussions :-)

And what is really amazing, nobody cares – the important thing is the claim.

BTW, I just finished the statistical analysis – it showed that Goren has 0.23- so we are back in square 1. “4-3-2-1, let’s play bridge for fun” :-)

ZAR

What do you mean by "Goren has .23" ???

Are you talking about the Error term, the score or what?

Regard the accuracy of Tysen's statistics.

Unless people demonstrate otherwise, I tend to trust them. In this case, I trust that Tysen calculated the statistics properly.

If I had doubts regarding the accuracy of his statistics, I would perform the same set of calculation using my own data and seek to verify his numbers.

If I were unable to reconcile his figures, I would then attempt to clarify methodology.

I don't understand why this notion is so complicated.

inquiry · June 7, 2004

What do you mean by "Goren has .23" ???
Are you talking about the Error term, the score or what?

Regard the accuracy of Tysen's statistics.
Unless people demonstrate otherwise, I tend to trust them. In this case, I trust that Tysen calculated the statistics properly.

If I had doubts regarding the accuracy of his statistics, I would perform the same set of calculation using my own data and seek to verify his numbers.

If I were unable to reconcile his figures, I would then attempt to clarify methodology.

I don't understand why this notion is so complicated.

He means he tested it and Goren came ouit 0.23 imps better per board than the other systems. He used statsitics to proof it, and he wants now to throw out ZAR points and other systems as being innaccurate.

You accepted Tysen's 0.28, etc, so why are you know questioining Zar's 0.23? Do find typen's 0.00 for Goren and Zar's 0.23 at odds? Maybe one of them is wrong? Maybe both of them? Why do you accept when you are a programmer and could test this is a short period of time by yourself?

I don't accept either of these. Clearly zar is just making a point. So the goren 0.23 is a joke. Tysen is more serious, and i ahve no doubt he thinks his evaluation is correct. Expeience, however, clearly shows to me that ZAR pointsi is much better than Goren. I have looked at a lot of hands, and this is easy to confirm. So I seriously doubt the small difference Tysen shows between them.

Second, i have begun evaluating Tysens TSP points, and find it very similar to ZAR points in many ways. The differences are rather mild, but they are there. But I find it equally unlikely htere will be as huge a difference between ZAR and TSP as shown and furthermore, i think if there is a difference, ZAR will score better (but here i have only a few hands to compare, as I do this the old fashion way, which will be unacceptible to all sides).

Richard, you are a computer programmer, and bridge player. You have the expertise to quiz this stuff yoursefl. Why not give it a go, and report back?

Ben

tysen2k · June 7, 2004

A flurry of activity over the weekend that I wasn't able to participate in. :blink:

Let me just highlight and comment on a few things on the last few posts.

Zar keeps pointing out "0.24 vs. 0.08" and saying that I'm claiming TSP is 3x better than Zar. I've never said such a thing in my life. As I explained many times before, this was the predicted number of IMP's improvement over HCP+321. All I'm saying is that TSP scores 0.16 IMPs per board better than Zar when compared to a team playing HCP+321. This is a far cry from a 3x better system. These evaluators are very very similar. Since the two methods bid the same thing over 90% of the time, who could claim such a vast difference? This is one reason why I suspect Zar's tests since they produce such different results with practically identical evaluators.

And again, I'm echoing the fact (as others are pointing out too) that Zar's tests are really just picking up agressiveness, not accuracy. I bet this is why Zar sees such a difference between our evaluators. I've said many times that if my system says to bid a grand on 0+ points I'd score perfectly on Zar's tests. Zar has never had a reply to this. The point is that I could be wrong about the number of TSP points needed to bid a small slam or grand. One of the strengths of my tests is that it only looks at accuracy of the system, not accuracy of the "steps."

About the fact that TSP doesn't add as much for a fit. TSP was designed to require the minimum amount of "post adjustment" as possible. Since sometimes the bidding won't let you know everything about partner's hand, it's an attempt to adjust before the bidding starts. I try to be more accurate initially so that you won't have to change as much later.

As those who have read my rgb posts know, my main interest is not really in finding the perfect evaluator, but in studying how valuation changes during the bidding. How does our evaluation change when partner opens 1♠? How does it change again when RHO overcalls 2♦? These points are actually very complicated and not easy to put into rules. Let me give you an example:

In my original TSP article at the top of this thread, I hinted at the fact that adding 2 points for each trump over 8 was very simplified, since the real answer was complicated. I've been finding in my studies that the values for honors change a lot depending on how distributional partner (and the opponents are). For example, if partner shows a 5+ suit, he is much more likely to be unbalanced than an "unknown" hand. Our shape becomes more important and our high cards lose importance. Everyone "knows" this, but we don't really have a quantitative feel about how much of an adjustment to make. If I wanted a more accurate evaluator after partner opens 1♠, I would actually subtract 1/3 of all TSP points outside of spades and then add in a constant of 4 points. Weak hands become stronger and strong hands weaker. The value of those high cards outside of trumps becomes less. However, if I'm going to do this, I'll likely have to lower the requirements for my slams by a few points since it's going to be harder to have two strong hands together. I could do this now at the table (thirds are easy to round off) but there's more. The amount that our high cards change depends on how distributional the other 3 hands are. If partner has a balanced hand, our high cards are now worth more, not less. Let's say partner is balanced with 4 spades, our valuation with 5 spades is going to be different than if partner is unbalanced with 4 spades. So the value of the extra trump not only depends on our shape, but on partner's shape as well. (and the opponents too!) No system takes this into consideration yet. I'm working on it. So you can see that the 2 points for an extra trump is just a placeholder for now.

Tysen

inquiry · June 7, 2004

If you make it worht 0 points with no shortness, 1 point with a doubleton, two points with a singleton and three points with a void, TSP will almost (not quite), but almost be Zar points.. Maybe you will discover this relationship in a few days.

Zar, I found two of the critical flaws, for the life of me can't find the other one. Help me out.. private message is ok, if you want to leave it as a puzzle for everyone else..hehehehe

Ben

tysen2k · June 7, 2004

If you make it worht 0 points with no shortness, 1 point with a doubleton, two points with a singleton and three points with a void, TSP will almost (not quite), but almost be Zar points.. Maybe you will discover this relationship in a few days.

Ben, the point I was trying to make was that it's not just based on extra trumps and your own shape (shortness) but on partner's shape as well. Your extra trump (with a void) is worth a different amount if partner is shapely rather than balanced.

Way back in Part 1 of my Hand Evaluation series, I talk about how much the extra trump is worth, with and without shortness. If you convert the tricks I talk about there into a TSP or Zar scale, you see that your should make the following adjustments:

Shape    Adjust
3=4-3-3    1
3=4-4-2    1
3=5-3-2    1
3=5-4-1    0
3=5-5-0    0
3=6-2-2    0
3=6-3-1    0
3=6-4-0    1
3=7-2-1    0
3=7-3-0    1
4=3-3-3    2
4=4-3-2    2
4=4-4-1    3
4=5-2-2    2
4=5-3-1    3
4=5-4-0    3
4=6-2-1    2
4=6-3-0    3
5=3-3-2    3
5=4-2-2    3
5=4-3-1    3
5=4-4-0    5
5=5-2-1    3
5=5-3-0    4

It's not as simple as just saying that an extra trump is worth x when you have a singleton, and y when you have a void.

hrothgar · June 7, 2004

What do you mean by "Goren has .23" ???
Are you talking about the Error term, the score or what?

Regard the accuracy of Tysen's statistics.
Unless people demonstrate otherwise, I tend to trust them. In this case, I trust that Tysen calculated the statistics properly.

If I had doubts regarding the accuracy of his statistics, I would perform the same set of calculation using my own data and seek to verify his numbers.

If I were unable to reconcile his figures, I would then attempt to clarify methodology.

I don't understand why this notion is so complicated.
He means he tested it and Goren came ouit 0.23 imps better per board than the other systems. He used statsitics to proof it, and he wants now to throw out ZAR points and other systems as being innaccurate.

You accepted Tysen's 0.28, etc, so why are you know questioining Zar's 0.23? Do find typen's 0.00 for Goren and Zar's 0.23 at odds? Maybe one of them is wrong? Maybe both of them? Why do you accept when you are a programmer and could test this is a short period of time by yourself?

I don't accept either of these. Clearly zar is just making a point. So the goren 0.23 is a joke. Tysen is more serious, and i ahve no doubt he thinks his evaluation is correct. Expeience, however, clearly shows to me that ZAR pointsi is much better than Goren. I have looked at a lot of hands, and this is easy to confirm. So I seriously doubt the small difference Tysen shows between them.

Second, i have begun evaluating Tysens TSP points, and find it very similar to ZAR points in many ways. The differences are rather mild, but they are there. But I find it equally unlikely htere will be as huge a difference between ZAR and TSP as shown and furthermore, i think if there is a difference, ZAR will score better (but here i have only a few hands to compare, as I do this the old fashion way, which will be unacceptible to all sides).

Richard, you are a computer programmer, and bridge player. You have the expertise to quiz this stuff yoursefl. Why not give it a go, and report back?

Ben

>Richard, you are a computer programmer, and bridge player. You have the >expertise to quiz this stuff yoursefl. Why not give it a go, and report back?

1. I no longer have a double dummy solver on my production machine

2. When I do have spare cycles, I am trying to focus on my MOSCITO notes

3. I don't have a dog in this fight. Believe it or not, I don't find hand evaluation methods to be particularly interesting. The only reason that I have bother to post in this thread is that I am trying to get Zar to appreciate that his efforts to promote Zar points may be hindered by his failure to adopt statistically valid techniques to measure efficiency.

tysen2k · June 7, 2004

As far as discounting singleton honors, let me paraphrase ZAR, which of the following hands would rather have?

A) ♠xxxxx ♥xxxx ♦K ♣AQJ, or

:blink: ♠Kxxxx ♥AQJx ♦x ♣xxx

You would probably answer B, but there are plenty of hands your partner could hold, where hand B would be worthless, and hand A would be golden, for instance, which hand above would you like to have if your partners hand was

Partner ♠A ♥void ♦AQxxxx ♣Kxxxxx

I'm not sure what this proves. Sure there will be hands where hand B would be worthless and hand A could be golden, but there are many more hands were the opposite is true. What you have to do it take the improvement/loss you get across from all of partner's possible hands and weight it by the probability that they actually have that hand. If you do this, you get the values calculated by Binky, and thus TSP.

If you actually believe that high cards are better in short suits, then why aren't you adding points for these stiff honors?

inquiry · June 7, 2004

If you actually believe that high cards are better in short suits, then why aren't you adding points for these stiff honors?

I will add or subtract values as I figure out if they are working or not... seems about right to me. To count a singleton Ace as only 4 is not such a good idea initially I think.

Ben

tysen2k · June 7, 2004

To count a singleton Ace as only 4 is not such a good idea initially I think.

Read Mike Lawrence's "Complete Book on Hand Evaluation." He has a whole section of the book dedicated to just singleton aces.

MickyB · June 7, 2004

To count a singleton Ace as only 4 is not such a good idea initially I think.
Read Mike Lawrence's "Complete Book on Hand Evaluation." He has a whole section of the book dedicated to just singleton aces.

Yes, a very good book (like most of his). To say that singleton honours should not be devalued because they might be more useful than if they were not singleton is like saying AKQ♦ x♣ is no better than xxx♦ K♣ because partner might have A A void AQJT9876543. Yes, a singleton honour may be very useful; but it is less likely to be useful than an honour that is not singleton. If partner's bidding tells you that your honour is actually worth its weight in gold, adjust then.

Zar · June 8, 2004

*** tysen2k wrote: "I've said many times that if my system says to bid a grand on 0+ points I'd score perfectly on Zar's tests. Zar has never had a reply to this.

<

Well, let’s have a look at the VERY first sentence of the Zar Points article (I’ll allow myself to quote because you’ll obviously never read anything about “those other points, the bad ones” :-). Please do not treat this as a promotion of Zar Points :-)

“Never Miss a Game Again? That’s easy – just bid a game on every board! :-)”

I hope you’ll read at list this sentence from the Zar Points stuff :-)

ZAR

tysen2k · June 8, 2004

*** tysen2k wrote: "I've said many times that if my system says to bid a grand on 0+ points I'd score perfectly on Zar's tests. Zar has never had a reply to this.
<

Well, let’s have a look at the VERY first sentence of the Zar Points article (I’ll allow myself to quote because you’ll obviously never read anything about “those other points, the bad ones” :-). Please do not treat this as a promotion of Zar Points :-)

“Never Miss a Game Again? That’s easy – just bid a game on every board! :-)”

Zar, I did read your article. I don't know how this quote is an argument to the point I raised which is that the test that you wrote pages and pages about is essentially worthless.

inquiry · June 8, 2004

*** tysen2k wrote: "I've said many times that if my system says to bid a grand on 0+ points I'd score perfectly on Zar's tests. Zar has never had a reply to this.
<

Well, let’s have a look at the VERY first sentence of the Zar Points article (I’ll allow myself to quote because you’ll obviously never read anything about “those other points, the bad ones” :-). Please do not treat this as a promotion of Zar Points :-)

“Never Miss a Game Again? That’s easy – just bid a game on every board! :-)”
Zar, I did read your article. I don't know how this quote is an argument to the point I raised which is that the test that you wrote pages and pages about is essentially worthless.

Let's keep this civil. :-) Zar's article specifically address the point you raised, Tysen, in saying that to never miss a game again, simply bid game on every hand. You twisted it a bit earlier to never miss a grand slam, simply bid one on every hand. So, in fact, he raised "your issue" before you ever did.

And in fact, ZAR did a lot more than just pay lipservice to the overbidding problem. If you read his research as well as his articile, you will see that he looked at 70,000 hands were partscore (3S/3H) was the double dummy correct contract, and compared ZAR to ZAR + Fit to Goren. Goren over bid on 22,000 of these, Zar+fit on 11,000, and Zar on 2,000 (my rounded numbers). If you prefer, you can check the number of contracts where 3H/3S would have been reached (48K goren, 59K Zar+fit, and 67 Basic Zar).

So, in fact, he has addressed, your basic concern. It doesn't seem he checked all possible contracts at all possible levels, but this adequately illustrates the point of stopping short of game.

As for the singleton honor issue. I too deduct for singleton honors. I like to have my highcards in my long suits, like we all do. I just think a two point deduction of a singleton ACE is way, way too much. And in Tysen's math, I didn't see a re-addition for short suit honors if they seem to fit well for parnter. So the points off, appear to stay off. This has to be wrong, at least on some auctions.

Ben

Improving Hand Evaluation Part 3

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation