BBO Discussion Forums: Systrem performance metrics - BBO Discussion Forums

Jump to content

Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic

Systrem performance metrics

#1 User is online   helene_t 

  • The Abbess
  • PipPipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 17,245
  • Joined: 2004-April-22
  • Gender:Female
  • Location:Copenhagen, Denmark
  • Interests:History, languages

Posted 2025-February-28, 05:29

I have determined opening bids for randomly dealt hands using various bidding systems - so far SA, English Acol, Scottish Acol, KS, Wei, Tarzan, IMPrecision, and Moscito. Will add some more.

And then I plan to derive various system performance metrics such as
- how often can responder immediately decide on something useful such as forcing to 2NT, ruling out a major suit fit etc.
- how much bidding space do we, on average, have left (taking anymy interference into account) below the safety level on those hands where responder can't immediately decide anything
- how often can either partner decide if it's right (not) to throw in the towel after opps' WJO

You might wonder what use such a data set is ....

Jan Eric Larsson wrote a book on comparison of bidding systems, using simulated tournaments. Now it is quite a lot of work to extend on this, as it requires implementing whole bidding systems. But if one could identify metrics, based on the opening bids alone, that predict system performance, one could easily scale this to many systems, maybe even search a large space of systems for the optimal one.

Or at least, when AWM or Kungsgeten design their own crazy system they could quickly see which weaknesses they have to work on.

Any thoughts?
The world would be such a happy place, if only everyone played Acol :) --- TramTicket
2

#2 User is online   P_Marlowe 

  • PipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 10,357
  • Joined: 2005-March-18
  • Gender:Male

Posted 2025-February-28, 06:47

There was a post on bridge newsgroup doing something similar, as far as I understood, he tried to determine the
information content of the first bid.
It was based oninformation theory (?), he assumed, that the system quality after the opening bid was the
same for all systems.

The user had a name like tycsn?
With kind regards
Uwe Gebhardt (P_Marlowe)
0

#3 User is online   helene_t 

  • The Abbess
  • PipPipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 17,245
  • Joined: 2004-April-22
  • Gender:Female
  • Location:Copenhagen, Denmark
  • Interests:History, languages

Posted 2025-February-28, 07:22

View PostP_Marlowe, on 2025-February-28, 06:47, said:

There was a post on bridge newsgroup doing something similar, as far as I understood, he tried to determine the
information content of the first bid.
It was based oninformation theory (?), he assumed, that the system quality after the opening bid was the
same for all systems.

The user had a name like tycsn?

This must be our Tysen, he wrote about that idea here also:

https://www3.dal13.s...post__p__473663

Basically he optimized his bidding system using the entropy of the distribution of the optimal contract, as seen by the partner of the player whose bid he wanted to optimize. And then he added an aggresiveness penalty to prevent the system for using up too much bidding space initially.

The "optimal" system turned out to involve a pass in first or second seat promising spade length.
The world would be such a happy place, if only everyone played Acol :) --- TramTicket
0

#4 User is online   P_Marlowe 

  • PipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 10,357
  • Joined: 2005-March-18
  • Gender:Male

Posted 2025-February-28, 07:30

View Posthelene_t, on 2025-February-28, 07:22, said:

This must be our Tysen, he wrote about that idea here also:

https://www3.dal13.s...post__p__473663

Basically he optimized his bidding system using the entropy of the distribution of the optimal contract, as seen by the partner of the player whose bid he wanted to optimize. And then he added an aggresiveness penalty to prevent the system for using up too much bidding space initially.

The "optimal" system turned out to involve a pass in first or second seat promising spade length.

Yes, Tysen, ... my spelling was hitting 3 out of 5, and a bonus for a letter in the wrong position,
not a bad showing of my long term memory, given that I did not think about this the last ? years.

Anyway, the result is basically a HUM, pass promises a hand stronger than hands, that open.
And HUM systems are prohibited, you cant play those in reg. environments.
The question is, can one do the optimization he was doing, using only allowed systems?
With kind regards
Uwe Gebhardt (P_Marlowe)
1

#5 User is offline   DavidKok 

  • PipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 2,693
  • Joined: 2020-March-30
  • Gender:Male
  • Location:Netherlands

Posted 2025-February-28, 09:39

I think this is an interesting question, and I'd like to know more about what sort of results pop out.

At the same time I think it's terrifyingly easy to use too simple a model. Bridge is a pretty complicated game, and reducing bidding systems or even opening agreements to a small set of criteria is dangerous. Personally I would approach this by listing as many criteria as possible, in an attempt to reduce the risk of missing something important. Here's a few suggestions:

  • What is our combined HCP distribution given responder's hand and opener's first bid? If the chance of it being 25+ is low enough (say, under 5%) or it being 24- is low enough (ditto) we have an answer to 'can we make game'. If not, we need more space to figure it out.
  • What is the probability of us having an 8(+)-card major fit?
  • What is the probability of the opponents having an 8(+)-card major fit?
  • Can we establish a safety level? Important break points are 1NT, 2M, 3m, 3NT and 4M.
  • Conditional on partner opening and RHO overcalling at the 1- or 2-level (with an appropriate hand), how often do we have a safety level beyond that?
Maybe double dummy simulations of game odds can take the place of my crude HCP suggestion. Though already some flaws snuck in: in general we should keep bidding until we think bidding on is negative expected value, which is not the same as bidding on until these questions have been answered to a satisfactory (but arbitrary) threshold - instead such threshoulds should depend on the amount of bidding space left to the safety level.
1

#6 User is offline   jdiana 

  • PipPipPipPip
  • Group: Full Members
  • Posts: 300
  • Joined: 2021-November-17

Posted 2025-February-28, 10:51

Please don't take this the wrong way - the fact that you can even do such a thing is amazing to me - but, just from a non-technical, pragmatic, point of view it would seem that there might be better uses for the time and effort that this would require.

Deriving system performance metrics based simply on the opening bid seems questionable to me. There are so many decisions that go into any bidding system. No one plays "2/1" exactly the same way. Also, what is the end goal? Is it to conclude that system x is best? I think what I'm hearing is that you might conclude that system x is relatively weaker than system y with respect to a particular metric, which would guide system x players toward trying to shore up that part of their system. That could be useful if it can be done accurately.

When I read about ideas like this - and there are many of them, given how many smart and computer-savvy people play bridge - I always think "How would this help me be better at the game?" (I understand that I'm probably not the target audience - experts will be more interested.)

As an example of something that I think would be really useful, Matthew Kidd has talked about adding a feature to BBO Helper as follows:

"A bigger idea is to provide a heads-up display of your opponent's style based on their playing history, classifying actions along both aggressiveness and wildness dimensions. So for takeout doubles, where a player averages on an (HCP + distribution points) histogram would measure their aggressive while their proclivity to make off-shape doubles would measure their wildness, e.g. a takeout double with 5-3 in the majors or a stiff in the clubs when 1♦ was opened, would drive up your wildness score, and a doubleton in an unbid major even more so. Similarly one can examine two-level overcalls, perhaps with a special statistic for how often they overcall on 5-3-3-2 shape—you see a lot of this even in open ACBL events; sometimes it can't be punished but sometimes it merely goes unpunished."

https://bridgewinner...2-2-se8mmlwks3/

Maybe it would be interesting to poll experts about what unanswered questions they think are most worth solving. It might also be useful to think about what BBO could do to help further this type of research. For example, if everyone was forced to click on a radio button to describe their basic system, would that make this kind of research easier? What else could BBO do to facilitate research?

Again, feel free to ignore these comments from a Luddite. They're just my two cents. :)
1

#7 User is offline   DavidKok 

  • PipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 2,693
  • Joined: 2020-March-30
  • Gender:Male
  • Location:Netherlands

Posted 2025-February-28, 11:02

View Postjdiana, on 2025-February-28, 10:51, said:

Deriving system performance metrics based simply on the opening bid seems questionable to me. There are so many decisions that go into any bidding system. No one plays "2/1" exactly the same way. Also, what is the end goal? Is it to conclude that system x is best? I think what I'm hearing is that you might conclude that system x is relatively weaker than system y with respect to a particular metric, which would guide system x players toward trying to shore up that part of their system. That could be useful if it can be done accurately.
My personal motivation for focusing on opening bids in particular is twofold. Firstly many modern expert systems do really well on constructed auctions, reaching a good contract 90+% of the time or so. In a sense, there are serious diminishing marginal returns to trying to optimise your constructive system over 'expert standard' - though in practice that is a pretty elusive target to hit in the first place. Secondly the bidding has become more competitive (what, really?!), with around 60-70% of modern auctions being contested. It is very reasonable to assume that you might have to deal with interference after your opening bid, and that your partner only has one shot to tell you what to do before you have to make a decision at a (possibly uncomfortably) high level. Therefore optimising the relevant information content conditional on expecting interference is of great value in modern bidding. This flies in the face of classical thinking, where people tried to squeeze their Fibonacci and exponential sequences in their frequency distributions.
1

#8 User is online   P_Marlowe 

  • PipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 10,357
  • Joined: 2005-March-18
  • Gender:Male

Posted 2025-February-28, 12:42

View Postjdiana, on 2025-February-28, 10:51, said:

<snip>
As an example of something that I think would be really useful, Matthew Kidd has talked about adding a feature to BBO Helper as follows:

"A bigger idea is to provide a heads-up display of your opponent's style based on their playing history, classifying actions along both aggressiveness and wildness dimensions. So for takeout doubles, where a player averages on an (HCP + distribution points) histogram would measure their aggressive while their proclivity to make off-shape doubles would measure their wildness, e.g. a takeout double with 5-3 in the majors or a stiff in the clubs when 1♦ was opened, would drive up your wildness score, and a doubleton in an unbid major even more so. Similarly one can examine two-level overcalls, perhaps with a special statistic for how often they overcall on 5-3-3-2 shape—you see a lot of this even in open ACBL events; sometimes it can't be punished but sometimes it merely goes unpunished."
<snip>


This is a full disclosure / monitoring problem / question. If you have a database like in chess, you coul do it.
At the momement the majority of games are not centrally accessible in an electronic way.
The hand records of most big tournaments are av. online, but scattered across.
With kind regards
Uwe Gebhardt (P_Marlowe)
0

#9 User is online   awm 

  • PipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 8,448
  • Joined: 2005-February-09
  • Gender:Male
  • Location:Zurich, Switzerland

Posted 2025-March-01, 04:29

While this obviously doesn't hurt anything, I think there's altogether too much focus on opening bids. I don't really think we're going to convince many people on this one way or another, nor do I think the openings are usually what makes the biggest difference in terms of bidding in a match (competitive methods and style seem far more important to me).
Adam W. Meyerson
a.k.a. Appeal Without Merit
1

#10 User is online   helene_t 

  • The Abbess
  • PipPipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 17,245
  • Joined: 2004-April-22
  • Gender:Female
  • Location:Copenhagen, Denmark
  • Interests:History, languages

Posted 2025-March-01, 12:37

I have calculatet frequency of "best conclusion" which responder can make, where I decided that forcing to game is the most useful thing responder can do. Establishing a major suit fit is the second best think (but you don't get extra credit for a major suit fit if you can also force to game).
Diagram: https://drive.google...iew?usp=sharing
Posted Image
We see that EHAA performs well on these metrics. The system often allows responder to draw some conclusion when opener has some 8-10(11) points and would have passed with other systems.

Scoring these "best conclusion" somewhat arbitrarily as 4 for game force, 3 for major fit, .... , 0 for no conclusion, we can plot this score against the aggressiveness (0 for pass, 1 for 1 etc), as the more aggressive systems score well on the above metrics as the information transmitted is highest when the opening bids (including pass) are all fairly equally frequent, but this comes at the price of less bidding space left for further information exchange (and of course, an EHAA 2 opening will sometimes be above the safety level):

https://drive.google...iew?usp=sharing
Posted Image

We see that IMPrecision scores well in this diagram with a high score despite moderate aggressiveness. I think there are three things that all work in IMPrecisions favour:
- strong club systems generally perform well as they make it easier for responder to decide if we have game
- The natural 2 opening is apparently more useful than the Precision 2 opening in Wei and Tarzan (those two systems are very similar)
- It seems that generally strong NT systems perform a bit better than weak NT systems on these metrics, although this is obviously confounded by all kind of things. This will become more pronounced once I factor in that when opening in 2nd seat, strong NT becomes more frequent. This analysis is only for first seat.

English and Scottish Acol perform very similarly. As for strong NT systems with 4-card majors, Norwegian is clearly better than Swedish which should not be surprising as Swedish standard is 4-card majors with weak hands and 5-card majors with strong hands. While this makes sense it doesn't get rewarded by these metrics. As for strong NT 5cM, there is almost no difference between short club (DD) and best minor (SA).

It surprises me that IMPrecision scores so much better than other systems, and that EHAA is so much "better" than Fantunes. I may need to do some double-checking.

Obviously, more sophisticated metrics are called for. For example, aggressiveness per see is probably not so interesting, it is more about being aggressive on those hands where it hurts opps without hurting ourselves too much.

I have implemented natural weak 2s for all systems (even Vienna!) except when e.g. 2 is needed for specific constructive hands. I thought preempt system is a separate issue which I don't want to confound this analysis.

There can be some confounding with exact shape requirements for 1NT opening. I have tried to make them consistent with what in my experience is mainstream for users of the different systems, but in many cases I didn't really know this so well.

PS: I don't know how to share images here, I have previously been able to with imgur but they give URLs without a png extension and apparently BBO does not allow that.
The world would be such a happy place, if only everyone played Acol :) --- TramTicket
0

#11 User is online   awm 

  • PipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 8,448
  • Joined: 2005-February-09
  • Gender:Male
  • Location:Zurich, Switzerland

Posted Yesterday, 02:25

It looks from your chart like the big advantage of IMPrecision is in determining that there is no game after the opening bid. I'm not actually sure how you determine this, but it is an advantage of light-opening systems that "pass" becomes more informative in a negative sense. The thing that looks strange is that I'd expect Moscito (which is slightly more aggressive about opening as best I can remember) to have a similar advantage here and that doesn't look to be the case.

It might be interesting to see some of these measurements broken down into the different opening bids (might also help with debugging if there's some mistake in the computation).
Adam W. Meyerson
a.k.a. Appeal Without Merit
1

#12 User is online   helene_t 

  • The Abbess
  • PipPipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 17,245
  • Joined: 2004-April-22
  • Gender:Female
  • Location:Copenhagen, Denmark
  • Interests:History, languages

Posted Yesterday, 03:15

Thanks, Adam. There are indeed a lot of hands where IMPrecision can conclude that there is no game while Tarzan and Wei can't conclude anything. Most of them are 0-10 opener hands where both systems pass but responder has some 12-14 points and can rule out game opposite an IMPrecision pass but not oposite a Tarzan or Wei pass, because those systems would also pass with some balanced 11-12 counts.
Moscito opens balanced 12 count but no balanced 11 counts, at least Wei and Tarzan open some of them. So my implementation of Moscito is not particularly aggressive.
It is a bit of a shame that those issues dominate the result so much. Obviously, balanced 9-12 counts are very frequent, so by these metrics it is mostly about how the systems handle those hands.

I will modify the strong club systems so that the all open all 12 counts but not all 11 counts, just like the natural systems.
The world would be such a happy place, if only everyone played Acol :) --- TramTicket
0

#13 User is offline   hrothgar 

  • PipPipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 15,508
  • Joined: 2003-February-13
  • Gender:Male
  • Location:Natick, MA
  • Interests:Travel
    Cooking
    Brewing
    Hiking

Posted Yesterday, 10:06

View Posthelene_t, on 2025-March-02, 03:15, said:

Thanks, Adam. There are indeed a lot of hands where IMPrecision can conclude that there is no game while Tarzan and Wei can't conclude anything. Most of them are 0-10 opener hands where both systems pass but responder has some 12-14 points and can rule out game opposite an IMPrecision pass but not oposite a Tarzan or Wei pass, because those systems would also pass with some balanced 11-12 counts.
Moscito opens balanced 12 count but no balanced 11 counts, at least Wei and Tarzan open some of them. So my implementation of Moscito is not particularly aggressive.
It is a bit of a shame that those issues dominate the result so much. Obviously, balanced 9-12 counts are very frequent, so by these metrics it is mostly about how the systems handle those hands.

I will modify the strong club systems so that the all open all 12 counts but not all 11 counts, just like the natural systems.


MOSCITO uses an 11+ - 14 HCP 1NT opening in first and second

In the version that I play, a bunch of the weaker balanced hands will open an assumed fit preempt

https://www.chrisrya...wo/frelling.htm
Alderaan delenda est
1

#14 User is online   awm 

  • PipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 8,448
  • Joined: 2005-February-09
  • Gender:Male
  • Location:Zurich, Switzerland

Posted Today, 03:58

The metrics you're using here seem rather arbitrary (and created this huge dependence on opening light on balanced hands). I'd go for something a bit more concrete and better linked to actual bridge. Here's one proposal:

Suppose that after opener's initial call, opponents preempt in some significant way. Now responder has to decide whether to pass or take some other action. To simplify matters, let's assume that if responder acts we can get to the best spot still available (but we cannot defend their preempt undoubled or play any cheaper contract) and that if responder passes we will always defend the opponents' preempt undoubled. How often can responder get this right?

Comparing to real bridge, there are probably some hands where opener will reopen after a pass (but these are likely relatively rare after a big preempt and may not impact the overall numbers very much). More problematic is that there are hands where it's hard to reach the best spot, but which these are can depend very much on the methods (for example there might be a hand where after a 3 preempt, responder really needs to bid three non-forcing spades but in the system 3 is forcing and double is unlikely to elicit a three spade bid) but getting too deep into the follow-up methods might be too hard.

Still, I'd expect to see some things like opening a five-card major faring better than opening a four-card major (but opening a four-card major might fare better than opening a nebulous minor) and openings like IMPrecision 2m doing better than a standard 1m.
Adam W. Meyerson
a.k.a. Appeal Without Merit
1

#15 User is online   helene_t 

  • The Abbess
  • PipPipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 17,245
  • Joined: 2004-April-22
  • Gender:Female
  • Location:Copenhagen, Denmark
  • Interests:History, languages

Posted Today, 14:45

View Postawm, on 2025-March-03, 03:58, said:

Here's one proposal:

Suppose that after opener's initial call, opponents preempt in some significant way. Now responder has to decide whether to pass or take some other action. To simplify matters, let's assume that if responder acts we can get to the best spot still available (but we cannot defend their preempt undoubled or play any cheaper contract) and that if responder passes we will always defend the opponents' preempt undoubled. How often can responder get this right?

This is a good idea but a bit of work (and computer power) as I strictly speaking need to evaluate many hands which opener could have from responder's point of view. Maybe I could just simulate three opener hands for each responder hand and then chose to get into the auction if it is right for at least two of the three?

But it relates somewhat to a discussion I had with David about safety level, defined as the level which responder knows is either LAWful or we have the values for it. For hands where opener has less than 20 HCPs (otherwise they may reopen anyway so responder doesn't need to act) and we actually have game (otherwise it's often not a disaster to sell out to the preempt), I calculated how often responder has a safety level above the preempt. In 50000 deals I got 2812 that satisfied this filter, assuming a hyperaggresive preempt style by opps.

Posted Image

I have added 12-point-minimum versions of IMPrecision, Wei and Tarzan, as discussed earlier, and also added Berkowitz and Bulgarian Precision. So now all systems open all balanced 12 but not all balanced 11 (except for EHAA).

We see that it isn't too different from what you might expect based on the other metrics, except that Fantunes and Moscito are quite OK by this metric. IMPrecision is not better than other strong club systems when not allowed to open balanced 11-counts.

I realize it is not quite the same as you suggest - sometimes it is reasonable for responder to guess to get into the auction even if there is no safety guarantee. On the other hand, there can be hands that have safety if we play NFB which we might not play. I also want to say something about how much bidding space we have below the safety level.

Another limitation is that I haven't factor in that e.g., our own 1 opening may thwart their 2 (or 2!)WJO, I just let opps bid what the planned to bid without worrying about our bids.

I will add more systems also.
The world would be such a happy place, if only everyone played Acol :) --- TramTicket
0

#16 User is online   awm 

  • PipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 8,448
  • Joined: 2005-February-09
  • Gender:Male
  • Location:Zurich, Switzerland

Posted Today, 16:13

I feel like there's something weird about the way you're evaluating this, such that opening lighter is virtually always better (to the degree that a one point difference in 1NT opening massively moves the needle and EHAA looks like by far the best of the systems). While there certainly are advantages to opening light, I don't think it's anywhere near this clear-cut. I suspect agreements something like 1=any 8-10, 1=any 11-13, 1=any 14-16, 1=any 17+ will look fairly decent here because you "always know when you have game" even though in reality this is a terrible system. You're basically maximising the "weak notrump" advantage while completely discarding any sort of fit-finding ability.

One point here is that knowing "you have game" isn't always a cure-all when you have no idea what your best strain is. Another is that light openings may already be past the level of safety in some cases (which doesn't seem to be a negative at all in the evaluation). Of course, the fact that light opening methods can help good opponents in the play would be quite hard to consider in a scheme like this one (but it also happens in real bridge).

Another somewhat interesting point is that responder's judgments are often based on the "average hand" and not the minimum hand. An extreme example might be SCUM, where the 1M opening technically only shows 4+ but it's almost always 5+; responder's going to treat this as five in competition, not be terrified to bid because of the very rare possibility of the 4144 hand. You can perhaps chalk up some of the hands where opener does have four and you get too high as system losses, but this is nowhere near as bad as responder always assuming four and under-competing the vast majority of cases.
Adam W. Meyerson
a.k.a. Appeal Without Merit
0

Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic

4 User(s) are reading this topic
1 members, 3 guests, 0 anonymous users

  1. awm