Systrem performance metrics
#1
Posted 2025-February-28, 05:29
And then I plan to derive various system performance metrics such as
- how often can responder immediately decide on something useful such as forcing to 2NT, ruling out a major suit fit etc.
- how much bidding space do we, on average, have left (taking anymy interference into account) below the safety level on those hands where responder can't immediately decide anything
- how often can either partner decide if it's right (not) to throw in the towel after opps' WJO
You might wonder what use such a data set is ....
Jan Eric Larsson wrote a book on comparison of bidding systems, using simulated tournaments. Now it is quite a lot of work to extend on this, as it requires implementing whole bidding systems. But if one could identify metrics, based on the opening bids alone, that predict system performance, one could easily scale this to many systems, maybe even search a large space of systems for the optimal one.
Or at least, when AWM or Kungsgeten design their own crazy system they could quickly see which weaknesses they have to work on.
Any thoughts?
#2
Posted 2025-February-28, 06:47
information content of the first bid.
It was based oninformation theory (?), he assumed, that the system quality after the opening bid was the
same for all systems.
The user had a name like tycsn?
Uwe Gebhardt (P_Marlowe)
#3
Posted 2025-February-28, 07:22
P_Marlowe, on 2025-February-28, 06:47, said:
information content of the first bid.
It was based oninformation theory (?), he assumed, that the system quality after the opening bid was the
same for all systems.
The user had a name like tycsn?
This must be our Tysen, he wrote about that idea here also:
https://www3.dal13.s...post__p__473663
Basically he optimized his bidding system using the entropy of the distribution of the optimal contract, as seen by the partner of the player whose bid he wanted to optimize. And then he added an aggresiveness penalty to prevent the system for using up too much bidding space initially.
The "optimal" system turned out to involve a pass in first or second seat promising spade length.
#4
Posted 2025-February-28, 07:30
helene_t, on 2025-February-28, 07:22, said:
https://www3.dal13.s...post__p__473663
Basically he optimized his bidding system using the entropy of the distribution of the optimal contract, as seen by the partner of the player whose bid he wanted to optimize. And then he added an aggresiveness penalty to prevent the system for using up too much bidding space initially.
The "optimal" system turned out to involve a pass in first or second seat promising spade length.
Yes, Tysen, ... my spelling was hitting 3 out of 5, and a bonus for a letter in the wrong position,
not a bad showing of my long term memory, given that I did not think about this the last ? years.
Anyway, the result is basically a HUM, pass promises a hand stronger than hands, that open.
And HUM systems are prohibited, you cant play those in reg. environments.
The question is, can one do the optimization he was doing, using only allowed systems?
Uwe Gebhardt (P_Marlowe)
#5
Posted 2025-February-28, 09:39
At the same time I think it's terrifyingly easy to use too simple a model. Bridge is a pretty complicated game, and reducing bidding systems or even opening agreements to a small set of criteria is dangerous. Personally I would approach this by listing as many criteria as possible, in an attempt to reduce the risk of missing something important. Here's a few suggestions:
- What is our combined HCP distribution given responder's hand and opener's first bid? If the chance of it being 25+ is low enough (say, under 5%) or it being 24- is low enough (ditto) we have an answer to 'can we make game'. If not, we need more space to figure it out.
- What is the probability of us having an 8(+)-card major fit?
- What is the probability of the opponents having an 8(+)-card major fit?
- Can we establish a safety level? Important break points are 1NT, 2M, 3m, 3NT and 4M.
- Conditional on partner opening and RHO overcalling at the 1- or 2-level (with an appropriate hand), how often do we have a safety level beyond that?
#6
Posted 2025-February-28, 10:51
Deriving system performance metrics based simply on the opening bid seems questionable to me. There are so many decisions that go into any bidding system. No one plays "2/1" exactly the same way. Also, what is the end goal? Is it to conclude that system x is best? I think what I'm hearing is that you might conclude that system x is relatively weaker than system y with respect to a particular metric, which would guide system x players toward trying to shore up that part of their system. That could be useful if it can be done accurately.
When I read about ideas like this - and there are many of them, given how many smart and computer-savvy people play bridge - I always think "How would this help me be better at the game?" (I understand that I'm probably not the target audience - experts will be more interested.)
As an example of something that I think would be really useful, Matthew Kidd has talked about adding a feature to BBO Helper as follows:
"A bigger idea is to provide a heads-up display of your opponent's style based on their playing history, classifying actions along both aggressiveness and wildness dimensions. So for takeout doubles, where a player averages on an (HCP + distribution points) histogram would measure their aggressive while their proclivity to make off-shape doubles would measure their wildness, e.g. a takeout double with 5-3 in the majors or a stiff in the clubs when 1♦ was opened, would drive up your wildness score, and a doubleton in an unbid major even more so. Similarly one can examine two-level overcalls, perhaps with a special statistic for how often they overcall on 5-3-3-2 shape—you see a lot of this even in open ACBL events; sometimes it can't be punished but sometimes it merely goes unpunished."
https://bridgewinner...2-2-se8mmlwks3/
Maybe it would be interesting to poll experts about what unanswered questions they think are most worth solving. It might also be useful to think about what BBO could do to help further this type of research. For example, if everyone was forced to click on a radio button to describe their basic system, would that make this kind of research easier? What else could BBO do to facilitate research?
Again, feel free to ignore these comments from a Luddite. They're just my two cents.

#7
Posted 2025-February-28, 11:02
jdiana, on 2025-February-28, 10:51, said:
#8
Posted 2025-February-28, 12:42
jdiana, on 2025-February-28, 10:51, said:
As an example of something that I think would be really useful, Matthew Kidd has talked about adding a feature to BBO Helper as follows:
"A bigger idea is to provide a heads-up display of your opponent's style based on their playing history, classifying actions along both aggressiveness and wildness dimensions. So for takeout doubles, where a player averages on an (HCP + distribution points) histogram would measure their aggressive while their proclivity to make off-shape doubles would measure their wildness, e.g. a takeout double with 5-3 in the majors or a stiff in the clubs when 1♦ was opened, would drive up your wildness score, and a doubleton in an unbid major even more so. Similarly one can examine two-level overcalls, perhaps with a special statistic for how often they overcall on 5-3-3-2 shapeyou see a lot of this even in open ACBL events; sometimes it can't be punished but sometimes it merely goes unpunished."
<snip>
This is a full disclosure / monitoring problem / question. If you have a database like in chess, you coul do it.
At the momement the majority of games are not centrally accessible in an electronic way.
The hand records of most big tournaments are av. online, but scattered across.
Uwe Gebhardt (P_Marlowe)
#9
Posted 2025-March-01, 04:29
a.k.a. Appeal Without Merit
#10
Posted 2025-March-01, 12:37
Diagram: https://drive.google...iew?usp=sharing

We see that EHAA performs well on these metrics. The system often allows responder to draw some conclusion when opener has some 8-10(11) points and would have passed with other systems.
Scoring these "best conclusion" somewhat arbitrarily as 4 for game force, 3 for major fit, .... , 0 for no conclusion, we can plot this score against the aggressiveness (0 for pass, 1 for 1♣ etc), as the more aggressive systems score well on the above metrics as the information transmitted is highest when the opening bids (including pass) are all fairly equally frequent, but this comes at the price of less bidding space left for further information exchange (and of course, an EHAA 2♠ opening will sometimes be above the safety level):
https://drive.google...iew?usp=sharing

We see that IMPrecision scores well in this diagram with a high score despite moderate aggressiveness. I think there are three things that all work in IMPrecisions favour:
- strong club systems generally perform well as they make it easier for responder to decide if we have game
- The natural 2♦ opening is apparently more useful than the Precision 2♦ opening in Wei and Tarzan (those two systems are very similar)
- It seems that generally strong NT systems perform a bit better than weak NT systems on these metrics, although this is obviously confounded by all kind of things. This will become more pronounced once I factor in that when opening in 2nd seat, strong NT becomes more frequent. This analysis is only for first seat.
English and Scottish Acol perform very similarly. As for strong NT systems with 4-card majors, Norwegian is clearly better than Swedish which should not be surprising as Swedish standard is 4-card majors with weak hands and 5-card majors with strong hands. While this makes sense it doesn't get rewarded by these metrics. As for strong NT 5cM, there is almost no difference between short club (DD) and best minor (SA).
It surprises me that IMPrecision scores so much better than other systems, and that EHAA is so much "better" than Fantunes. I may need to do some double-checking.
Obviously, more sophisticated metrics are called for. For example, aggressiveness per see is probably not so interesting, it is more about being aggressive on those hands where it hurts opps without hurting ourselves too much.
I have implemented natural weak 2s for all systems (even Vienna!) except when e.g. 2♦ is needed for specific constructive hands. I thought preempt system is a separate issue which I don't want to confound this analysis.
There can be some confounding with exact shape requirements for 1NT opening. I have tried to make them consistent with what in my experience is mainstream for users of the different systems, but in many cases I didn't really know this so well.
PS: I don't know how to share images here, I have previously been able to with imgur but they give URLs without a png extension and apparently BBO does not allow that.
#11
Posted Yesterday, 02:25
It might be interesting to see some of these measurements broken down into the different opening bids (might also help with debugging if there's some mistake in the computation).
a.k.a. Appeal Without Merit
#12
Posted Yesterday, 03:15
Moscito opens balanced 12 count but no balanced 11 counts, at least Wei and Tarzan open some of them. So my implementation of Moscito is not particularly aggressive.
It is a bit of a shame that those issues dominate the result so much. Obviously, balanced 9-12 counts are very frequent, so by these metrics it is mostly about how the systems handle those hands.
I will modify the strong club systems so that the all open all 12 counts but not all 11 counts, just like the natural systems.
#13
Posted Yesterday, 10:06
helene_t, on 2025-March-02, 03:15, said:
Moscito opens balanced 12 count but no balanced 11 counts, at least Wei and Tarzan open some of them. So my implementation of Moscito is not particularly aggressive.
It is a bit of a shame that those issues dominate the result so much. Obviously, balanced 9-12 counts are very frequent, so by these metrics it is mostly about how the systems handle those hands.
I will modify the strong club systems so that the all open all 12 counts but not all 11 counts, just like the natural systems.
MOSCITO uses an 11+ - 14 HCP 1NT opening in first and second
In the version that I play, a bunch of the weaker balanced hands will open an assumed fit preempt
https://www.chrisrya...wo/frelling.htm
#14
Posted Today, 03:58
Suppose that after opener's initial call, opponents preempt in some significant way. Now responder has to decide whether to pass or take some other action. To simplify matters, let's assume that if responder acts we can get to the best spot still available (but we cannot defend their preempt undoubled or play any cheaper contract) and that if responder passes we will always defend the opponents' preempt undoubled. How often can responder get this right?
Comparing to real bridge, there are probably some hands where opener will reopen after a pass (but these are likely relatively rare after a big preempt and may not impact the overall numbers very much). More problematic is that there are hands where it's hard to reach the best spot, but which these are can depend very much on the methods (for example there might be a hand where after a 3♥ preempt, responder really needs to bid three non-forcing spades but in the system 3♠ is forcing and double is unlikely to elicit a three spade bid) but getting too deep into the follow-up methods might be too hard.
Still, I'd expect to see some things like opening a five-card major faring better than opening a four-card major (but opening a four-card major might fare better than opening a nebulous minor) and openings like IMPrecision 2m doing better than a standard 1m.
a.k.a. Appeal Without Merit
#15
Posted Today, 14:45
awm, on 2025-March-03, 03:58, said:
Suppose that after opener's initial call, opponents preempt in some significant way. Now responder has to decide whether to pass or take some other action. To simplify matters, let's assume that if responder acts we can get to the best spot still available (but we cannot defend their preempt undoubled or play any cheaper contract) and that if responder passes we will always defend the opponents' preempt undoubled. How often can responder get this right?
This is a good idea but a bit of work (and computer power) as I strictly speaking need to evaluate many hands which opener could have from responder's point of view. Maybe I could just simulate three opener hands for each responder hand and then chose to get into the auction if it is right for at least two of the three?
But it relates somewhat to a discussion I had with David about safety level, defined as the level which responder knows is either LAWful or we have the values for it. For hands where opener has less than 20 HCPs (otherwise they may reopen anyway so responder doesn't need to act) and we actually have game (otherwise it's often not a disaster to sell out to the preempt), I calculated how often responder has a safety level above the preempt. In 50000 deals I got 2812 that satisfied this filter, assuming a hyperaggresive preempt style by opps.

I have added 12-point-minimum versions of IMPrecision, Wei and Tarzan, as discussed earlier, and also added Berkowitz and Bulgarian Precision. So now all systems open all balanced 12 but not all balanced 11 (except for EHAA).
We see that it isn't too different from what you might expect based on the other metrics, except that Fantunes and Moscito are quite OK by this metric. IMPrecision is not better than other strong club systems when not allowed to open balanced 11-counts.
I realize it is not quite the same as you suggest - sometimes it is reasonable for responder to guess to get into the auction even if there is no safety guarantee. On the other hand, there can be hands that have safety if we play NFB which we might not play. I also want to say something about how much bidding space we have below the safety level.
Another limitation is that I haven't factor in that e.g., our own 1♠ opening may thwart their 2♥ (or 2♠!)WJO, I just let opps bid what the planned to bid without worrying about our bids.
I will add more systems also.
#16
Posted Today, 16:13
One point here is that knowing "you have game" isn't always a cure-all when you have no idea what your best strain is. Another is that light openings may already be past the level of safety in some cases (which doesn't seem to be a negative at all in the evaluation). Of course, the fact that light opening methods can help good opponents in the play would be quite hard to consider in a scheme like this one (but it also happens in real bridge).
Another somewhat interesting point is that responder's judgments are often based on the "average hand" and not the minimum hand. An extreme example might be SCUM, where the 1M opening technically only shows 4+ but it's almost always 5+; responder's going to treat this as five in competition, not be terrified to bid because of the very rare possibility of the 4144 hand. You can perhaps chalk up some of the hands where opener does have four and you get too high as system losses, but this is nowhere near as bad as responder always assuming four and under-competing the vast majority of cases.
a.k.a. Appeal Without Merit