Jump to content

Improving Swiss Teams events


hrothgar

Recommended Posts

The problem though is with the "leaderboard" reflecting something different than what's actually happening. Sure, if there's a list of teams with point totals somewhere and my team has the most points, but didn't win, I can see not liking what's going on.

 

But it's easy enough to do something like adding points for playing teams in the top part of the standings so the scoreboard reflects who's actually winning.

 

In fact it does seem strange and vaguely frustrating when my team's doing pretty well and draws the Meckstroth-Rodwell-Soloway-client team, holds our own against them but loses by a small margin, and then finds that a dozen teams have passed us by beating up on weak opponents.

 

It's always seemed strange to me that in swiss scoring, if the teams are roughly ranked "correctly" then pretty much every team expects a draw. So subsequent rounds wouldn't seem to widen the gaps between the better teams and the worse teams, since the better teams have better opponents. I suppose in some ways it's nice that it's very hard for a team to separate itself from the pack, but this leaves open the possibility of a team getting a "lucky draw" in a late round where they have lots of swingy boards against bad opponents, and passing everyone without really beating the good teams.

Link to comment
Share on other sites

  • Replies 50
  • Created
  • Last Reply

Top Posters In This Topic

The big reason that SoS works in the BCS ranking system (to the extent that it does work, of course - what a *stupid* way to run a National Championship, it rivals that of several NCBOs!) is that the teams set their schedules. Sure, this might be the year that Auburn flames out, and there's always the cross-state rival game for Homecoming that we have to put in there no matter how much it hurts our schedule because we make half our money on that one game, but generally, the teams have a good guess what their SoS is going to be going in, and whether they need to push for 11-0 or if 10-1 is going to get them to a BCS spot (or, if they're weaker, whether their SoS and a 6-5 record will make it to the Joe's Garage Durian Bowl, sponsored by Mom's Pizza Joint (still on ABC, for some reason, however)).

 

Give me a Bridge tournament, where I know most of the teams, and I am allowed to choose my opponents (with their permission, of course), then SoS is reasonable. Yeah, there are going to be several teams going 8-0, because they've picked the fish who want to play up; but their SoS is going to be horrible. If I can go 7-1 playing mostly good teams, I'll do better (as I should).

 

But as Swiss is a lottery, and all SoS is doing is reversing the lottery dynamics.

Round 1: paired with the worst team in the room. Win 20-0.

Round 2: paired with a big winner of rabbit vs ferret. Win 20-0.

Round 3: caught a decent team, but they blew away their first set of fish and got lucky on the second round in a swingy match. Win 16-4.

 

I'm never going to win this, am I? I'm already so far in the hole on SoS that I'll never recover, and my VP total is going to mean that I get paired with all the other contenders all night, making it very hard to win *and* pick up the SoS.

 

If you're looking to pick 8 from 128, you're always going to have an issue, unless you can play for a week - 2 from 32 could probably be done RR over 3-4 days. Frankly, that's the problem that the BCS has - they have to find the best 2 pairs of 2 teams from a class of 100+, based on 11 one-on-one matches. *Nothing* is going to get that unarguably right. At least in the NCAA, they have to find 64 (yeah, I know - but the play-in game is stupid and doesn't count) out of the 200 or so, based on 30 or so games.

 

Michael.

Link to comment
Share on other sites

I don't understand the desire to achieve this goal of accuracy. If all we want is for the best team to win as often as possible lets not even play, lets just take a vote on who is the best team and give them some masterpoints. And then why would anyone but the best team even show up?

If you never play, then how would anyone know who the best team is? Who would you vote for? It's not a beauty pageant :)

 

Furthermore, the "best" changes over time. New players come onto the scene, old players retire, partnerships and teams change, etc. Richard's simulation simply assigned rankings to all its teams, and from this it presumably derived probabilities about the results of matches between any pair of teams. But in the real world we don't know the current rankings until AFTER everyone plays. As they say about the lottery, "You have to be in it to win it."

 

So the point of the exercise is that since we're going to hold tournaments, we'd like the results to be meaningful -- it should be more likely that the better teams win. So the scoring system and tournament format should reflect this expectation.

 

But as others have pointed out, there may be conflicting goals, such as simplicity of scoring and finishing the tournament in a reasonable amount of time. 9 rounds of 20 boards would require 3 days to play out -- not your typical Sunday Swiss.

Link to comment
Share on other sites

So the point of the exercise is that since we're going to hold tournaments, we'd like the results to be meaningful -- it should be more likely that the better teams win.

Do you mean more likely that the better team wins than that the better team loses? We already have that.

 

Or do you mean more likely that the better team wins under a new system than that the better team wins under the current system? I strongly disagree with this, it is at best a debatable assertion, on which I am still firmly on the other side.

 

This seems like an exercise to make it as close to a certainty as possible that the better team always wins any time they play. Why!!! It is the chance of upsets by weaker teams (while still losing more often than winning) that make things fun and interesting. The results are still meaningful when taken over any sort of long term. When taken for an individual day or event I can't fathom why someone would want the best team to always win or anything close to it.

 

Adam made an earlier post sort of implying that he thought I felt this way for less meaningful events, but for really "important" stuff something more accurate would be better. No no no, you have to give all the teams something meaningful to play for. If the only goal was to pick the best team every time, a vote among the players would be far more accurate than actually playing. Or do you just quit when you sit down against Meckwell?

Link to comment
Share on other sites

So Josh, why do the USBF trials use seeding, and take this long? They could as well play a 3-day round-robin.

 

I think for events such as trials, pretty much everyone would disagree with your point. And in bridge there is always enough luck that the better team won't always win.

Link to comment
Share on other sites

So Josh, why do the USBF trials use seeding, and take this long? They could as well play a 3-day round-robin.

So why do they play at all? They could just send the top seed.

I think for events such as trials, pretty much everyone would disagree with your point. And in bridge there is always enough luck that the better team won't always win.

It seems that would not be true if lots of people here had their way.

Link to comment
Share on other sites

I agree.

 

Why waste all this time and money of playing...cannot we have the Old wiseman just pick the best players...We do want to win number one...yes?...

 

Imp pairs/Butler pairs..whatever.......round robin pairs..or this expert seeding ..just seems a waste of time.....why let lucky players go forward?

Link to comment
Share on other sites

Or do you mean more likely that the better team wins under a new system than that the better team wins under the current system? I strongly disagree with this, it is at best a debatable assertion, on which I am still firmly on the other side.

One of the advantages of the methodology that we are using is that it provides an objective mechanism to test these types of assertions. Simply put, the numbers don't agree with you.

 

I don't have any problem with discussing this topic (Its part of the reason that I posted these results). However, it would be useful if people critical of the results were able to provide a more useful critique. There are three (broad) areas in which these results can be criticized:

 

1. The model that Alex chose does not accurate describe results at a bridge table. (For example, one could claim that there is autocorrelation between board results or what have you)

 

2. The model may be correct, however, Alex chose the wrong set of operating parameters. Hypothetically, one might claim that the strength of bridge players is better described by a uniform distribution than a normal distribution.

 

3. Implementation problems: Even if you find a better way to skin the cat, no one would ever use it

Link to comment
Share on other sites

I agree.

 

Why waste all this time and money of playing...cannot we have the Old wiseman just pick the best players...We do want to win number one...yes?...

 

Imp pairs/Butler pairs..whatever.......round robin pairs..or this expert seeding ..just seems a waste of time.....why let lucky players go forward?

There are a number of countries that use this type of selection process for their National teams. For example, I believe that this is used in Italy. I think that at least one of the Scandinavian countries does as well.

 

Obviously, the selectors need some kind of data that they can use to make their decisions which pretty much needs to come from board results. As I understand matters, the primary critique of the selector type systems boil down to

 

1. It requires a good selector (Or at the very least a very good process that the selectors can apply)

 

2. It destroys any chance that sponsors make the team

Link to comment
Share on other sites

Richard, I was wondering if it would be better to multiply results with the number of VP your opponents got, instead of just adding them.

 

For example (short one):

round 1 25-1 against losers

round 2 17-13 against good guys

round 3 20-10 against reasonable players

 

Total of 3 rounds is 62

Suppose in the end our good guys have 60VP, reasonable players have 40VP, the losers have 15VP.

Our total would become 25*15 + 17*60 + 20*40

 

Dunno if this would be an even better representation of your performance on that tournament, but intuitively it feels better... :o

Link to comment
Share on other sites

Richard, I was wondering if it would be better to multiply results with the number of VP your opponents got, instead of just adding them.

I proposed a very similar adjustment based on the fraction of the available VPs that you scored against a given team. (For example, if you scored a 10-10 tie against team i, you'd get 50% of team i's total VPs). I agree that intuitively this makes a lot of sense. (At the same time, I'm not sure how good my intuition is on these problems) Unfortunately, we haven't had the chance to test this yet and see whether it improves the accuracy of the SoS adjustment.

 

Right now, we think that "naive" SoS adjustment has a significant enough impact on the accuracy of the tournament that its worth presenting the basic results and solicit comments.

 

Over time, we hope that we can do some further work designed to improve the accuracy of the SoS metric. In theory, we could adopt something significantly more complex like some of the structures that Gerben was using to evaluate incomplete tournaments.

Link to comment
Share on other sites

Or do you mean more likely that the better team wins under a new system than that the better team wins under the current system? I strongly disagree with this, it is at best a debatable assertion, on which I am still firmly on the other side.

One of the advantages of the methodology that we are using is that it provides an objective mechanism to test these types of assertions. Simply put, the numbers don't agree with you.

 

I don't have any problem with discussing this topic (Its part of the reason that I posted these results). However, it would be useful if people critical of the results were able to provide a more useful critique. There are three (broad) areas in which these results can be criticized:

 

1. The model that Alex chose does not accurate describe results at a bridge table. (For example, one could claim that there is autocorrelation between board results or what have you)

 

2. The model may be correct, however, Alex chose the wrong set of operating parameters. Hypothetically, one might claim that the strength of bridge players is better described by a uniform distribution than a normal distribution.

 

3. Implementation problems: Even if you find a better way to skin the cat, no one would ever use it

How can numbers disagree with a subjective opinion? I'll try again.

 

I do not think making the results of swiss team events more "accurate" is a good thing. Most people seem to be taking for granted that it is. I am not criticizing the effectiveness or accuracy of the model itself, I'm saying I don't ever want to see it implemented (which thankfully, I doubt it will be.)

Link to comment
Share on other sites

I proposed a very similar adjustment  based on the fraction of the available VPs that you scored against a given team.  (For example, if you scored a 10-10 tie against team i, you'd get 50% of team i's total VPs).  I agree that intuitively this makes a lot of sense.  (At the same time, I'm not sure how good my intuition is on these problems)

I don't trust my intuition also, but I feel that this method would make in very hard for teams to climb and fall in the rankings in the latter stages of events assuming 11/9 10/8s etc are the most common scores.

Link to comment
Share on other sites

Not the best team, the team who played best in the tournament.

No, exactly what I said. Read the original post.

 

"We consider event formats in which the sample statistic [swiss team results] closely mirrors the population statistic [skill level or ability of the teams] superior to formats in which {this is not the case}."

That's a false issue that just has to do with applying the model to real life. Consider the population statistic that was used to be the teams raw "skill level" plus a modifier based on conditions of the day (I.e., current form). So a team that in the OP's model had been rated as a +2 team could be a +1.5 team that is "playing well" (well rested, good frame of mind, etc.) or a +2 team playing normally or a +2.5 team playing below their average skill level. Thus even if the "best team" according to the model won it doesn't necessarily translate into the best same 4 or 6 people winning.

 

With respect to the more general point there is a procedure for this from the Chess world where many tournaments are swiss tournaments (and in fact where swiss tournaments began) with limited rounds and a 0-1 VP scale (with .5-.5 for draws) and further constraints (have to decide W vs B color issues with the pairings). Obviously in chess tiebreakers are pretty important since with a small range of scores many players can end up tied. See swiss perfect tiebreaking site for all the methods.

 

The simplest method that sort of works to adjust the SoS that is even simpler than the OP is to simply take your cumulative score. This works as a proxy for your strength of schedule but is very simple to calculate since you only have to look at your own team's score. For instance if you went 20-0, 16-4, 11-10, 2-18 you would have 49 VP out of 80 but your cumulative score would be 20+36+47+49 = 152. If some other team scored 0-20, 11-9, 20-0, 18-2 they also have 49 VP but they were taking the easy side by getting blitzed in their first round and have a cumulative score of 0+11+31+49 = 91. Thus you win the tiebreak.

 

Of course in chess this only works as a tiebreaker, not as something that gets added to your score. And the problem in bridge swiss events is that the victory points lead to an expanded range so what if that second team was 19-1 in their 4th round match and had 50 VP. Was that a more impressive finish than the first team who played up but only ended up with 49 VP? Probably not. But if you are going to make any adjustment at the end I think you clearly also want that the same kind of bonus/adjustment to be used after each round to set up the next rounds pairings.

 

To the OP, what exactly was you adjustment formula? You said it depended on the number of rounds, but how did it work?

Link to comment
Share on other sites

So why do they play at all? They could just send the top seed.

 

This doesn't change much, now the seeding mechanism is the actual qualification (I don't know if this is such a bad idea!)

 

*********************

 

The main problem with the combination Swiss + VP is the closeness in scores. If you can only win or lose, a win in the last match will not allow anyone to overtake you, and a loss will only allow others to tie against you. My main problem is that Swissing keeps the field closer and allows teams to catch up with the leaders way too easily.

 

Example from Estoril transnationals. Actual final ranking after 15 rounds:

 

1st. 276

2nd. 270

3rd. 268

4th. 266

5th. 265

6th. 264

7th. 262

8th. 261

9th. 260

10th. 259

 

Average: 220

 

If you look however what the top 10 would've gotten against AVERAGE opponents:

 

1st. 308

2nd. 304

3rd. 300

4th. 296

5th. 292

6th. 291

7th. 288

8th. 286

9th. 280

10th. 278

 

Average: 220

 

Let's say you play a 16th round in each case. The chance that a lucky team reaches the top 8 (or pick 4, 16, any number) for the KOs suddenly is much greater in the 1st scenario.

 

I do not think making the results of swiss team events more "accurate" is a good thing. Most people seem to be taking for granted that it is.

 

Why do you think giving more points to teams who played well compared to teams who didn't is a bad thing? Just because we've always used Swiss?

Link to comment
Share on other sites

I do not think making the results of swiss team events more "accurate" is a good thing. Most people seem to be taking for granted that it is. I am not criticizing the effectiveness or accuracy of the model itself, I'm saying I don't ever want to see it implemented (which thankfully, I doubt it will be.)

OK, then -- how about making the results of swiss team events LESS "accurate"? There are lots of adjustments we could make to format/pairing/scoring that would acheive that. Would that be a good thing? Or are you claiming that the existing system is perfect?

Link to comment
Share on other sites

I do not think making the results of swiss team events more "accurate" is a good thing. Most people seem to be taking for granted that it is. I am not criticizing the effectiveness or accuracy of the model itself, I'm saying I don't ever want to see it implemented (which thankfully, I doubt it will be.)

OK, then -- how about making the results of swiss team events LESS "accurate"? There are lots of adjustments we could make to format/pairing/scoring that would acheive that. Would that be a good thing? Or are you claiming that the existing system is perfect?

It's an impossible claim to make since it's just a matter of opinion. But I do like the current system very much. Sure people complain but people would complain about any system in use. I can't think of a single sport or game (perhaps other than those decided by voting or judging) in which either the best team always wins, or strength of schedule is taken into account in the final ranking of a single event, though it's possible there is something I don't know about that uses SOS (maybe in chess as a tiebreaker?) Simply put, if it's not broke don't fix it.

 

If I did have to change, I would prefer it to be less accurate compared to more accurate. A very good team (in context) tends to win swiss team events pretty consistently, albeit not always the best team. In fact I would say the odds the winning team played best that day is enormously high, purely as a gut feeling. There are very few fans of anything who would want to all but eliminate upsets (ok, maybe NY Yankee fans.) Look how wildly innaccurate a poker event is in terms of who played best that day compared to who won, the correlation is very small indeed. Is that game lacking in popularity?

 

Richard is right in any case that it can never hurt to understand what you are working with as best you can.

Link to comment
Share on other sites

If I did have to change, I would prefer it to be less accurate compared to more accurate. A very good team (in context) tends to win swiss team events pretty consistently, albeit not always the best team. In fact I would say the odds the winning team played best that day is enormously high, purely as a gut feeling. There are very few fans of anything who would want to all but eliminate upsets (ok, maybe NY Yankee fans.)

In a 2-day event with 128 teams? This sounds pretty much like lottery to me. (And I understand that's where the Australians are planning to use it.) Even in a one-day K.O. match it doesn't always seem to be the better-team-on-that-day that ends up winning.

 

Anyway, they are only trying to make up for the unfairness of Swiss scoring, not for the 5% slams (needing two 3-3 breaks and the wrong lead) that make so there will be enough element of chance left...

Link to comment
Share on other sites

I agree with a lot of things around here.

 

Bluejak, in another forum, has made this point, which I believe is very strong: Swiss events are popular not because they are a reliable test of current form, but because you're going to get something (chance of being Monsterpoint blitzed is very low) and because swisses are, in fact, a highly luck-based event.

 

If that is the case, "improving" the swiss by removing some of the luck will, in fact, decrease the popularity of the event. As others have said, how many point-a-board (board-a-match in NA) events are there any more? Why? Because if you're not the best, you can't win.

 

If you are trying to come up with a pre-trials method, as it seems Richard et al. are, then you want an event where the best 8 on the day out of 128 are as likely as possible to be in the top 8 of the results. In other words, exactly what the Swiss won't do.

 

Swiss is good for feeding out the top, the top 2; you need, supposedly, another round for each lower team you wish to rank, which leads into overswissing issues - you could avoid that by having the best and worst leave after N rounds, but the problem is that they paid (especially the worst) to play.

 

I would strongly suggest some sort of qualifying/consolation event, if it's over two days (if it's over 4, then my previous suggestion of 2/32, RR, applies. People may not like that as much though, if they're not expecting to make the top. There's a joy in the randomness, especially for the card-push set). From 128, qualify 64, seeded into two flights of 32, whatever carryover seems appropriate - frankly, if you want to get "the best at beating good opposition", no carryover might be right. Swiss the three events (1Q, 2Q, C); play your standard set in C (most of whom want consistency over accuracy anyway), and 6 rounds of however many boards in the two Qs. That should let xQ1 and xQ2 rise to the top, without too much overswiss. 3 and 4 in each bracket will be somewhat more of a crapshoot, but you'll have reasonable confidence that the top four teams, at least, have made it in. In the Qs, use whatever methods you like to "improve" the results of the top four; many more of them will "get it".

 

Note that this also helps the "get our 3/4 green" people, as the second day has weeded out most of the big fish from the consolation, and those few that get there that came to qualify may prefer the beach to playing.

 

Michael.

Link to comment
Share on other sites

If I did have to change, I would prefer it to be less accurate compared to more accurate. A very good team (in context) tends to win swiss team events pretty consistently, albeit not always the best team. In fact I would say the odds the winning team played best that day is enormously high, purely as a gut feeling. There are very few fans of anything who would want to all but eliminate upsets (ok, maybe NY Yankee fans.) Look how wildly innaccurate a poker event is in terms of who played best that day compared to who won, the correlation is very small indeed. Is that game lacking in popularity?

I'm not sure what makes you think the proposal would make upsets less likely. Nothing in the proposal gives any bias to teams based on their past performance. The scoring is still based just on how well they play during the event in question. I think you're confusing the simulation, which assigns rankings to the simulated teams, with how it would be used in practice -- no rankings are involved there. The rankings represent a prediction of how well the teams will play during the simulated event.

 

If scoring is less accurate, it means that just about anyone can win. Should bridge really be a lottery, rather than a game of skill?

 

What I wonder is how relevant this proposal would be to the Swiss Team tournaments that are most commonly played. How accurate is a two-session Swiss (typically 8 rounds of 7 boards in most ACBL tournaments), and how much of an improvement would there be if the SoS adjustment were made?

 

Regarding the comparison with poker ... I haven't watched too much poker (mostly just Celebrity Poker Showdown -- the table chatter is fun), but the impression I get is that the winner is often determined by just a few critical hands. While there's certainly skill in reading your opponents and knowing when to bluff, a lucky card when someone decides to go all in can make all the difference.

In contrast, while a slam swing can often determine the winner of a 7-board match, but it's not likely to determine the winner of the event.

Link to comment
Share on other sites

Wouldn't it be great to replace Swiss pairing by a method that gave good estimates for the relative strength of all the top-half teams? Gerben Dirksen (Gerben42) has made a start in this direction by estimating the outcomes of unplayed matches by something like maximum likelihood.

 

Suppose that we could choose the pairings on each round, and the number of boards for that round's matches, based on maximizing the accuracy of our after-the-round estimates. I wonder what such a method would look like.

 

While it might not be popular as a replacement for the Sunday Swiss, I'll bet such a method would be embraced by national bridge organizations seeking to select their best representatives for international play. It might even give rise to unbiased, worldwide strength estimates for individuals, pairs, and teams!

 

No, I must be dreaming. :)

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

×
×
  • Create New...