Overwatch's Ranking Point System

July 24, 2016

Overwatch team: great job and all. If you want to listen to an hour of me saying great job, here's a podcast about that.

You should probably re-think the current system of gaining/losing rank points though. Specifically, adjusting the ranking based on individual performance rather than just win/loss is pretty dangerous.

Elo for Team Games

Elo is a standard ranking system. You gain points for winning and lose points for losing. Furthermore, you gain more points for beating someone ranked higher and you lose more points for losing to someone ranked lower. Elo is designed for 1v1 games though, not team games.

To generalize Elo to team games, there's two factors you'd use. First, if your team's AVERAGE ranking was lower than the opposing team's average ranking, then you should get more points for winning. Second, if your PERSONAL ranking is lower than your team's average ranking, you should get more points for winning than your higher-ranked teammates who also won. As far as I know, all of this is true in Overwatch and makes sense.

But what about your individual PERFORMANCE during a game? For example, you lost but you played really well and your stupid teammates caused you to lose. Should you lose FEWER points for this loss because you personally played well? This is dangerous territory. If your instinct is to say yes, then at least consider that this requires you to gain fewer points for a win if you happened to not play well. That's really the least of our worries though.

Before we get into adjusting ranking based on individual performance, does Overwatch currently do that at all? The answer appears to be yes. Here's an excerpt from this article:

Cloud9 carry Lane “SureFour” Roberts was the first player to hit 80 Skill Rating when Competitive Play launched, queuing almost exclusively with his professional player teammates. When Roberts finished his 10 placement matches, he received a 77 Skill Rating, commensurate to his talent as one of the best players in Overwatch. Cloud9’s support Adam Eckel, who played the exact same 10 placement matches as part of a Cloud9-stacked queue, only received a 67 Skill Rating. His tank teammates hit 71. Derrick "reaver" Nowicki, the team’s other carry player, hit 74. All of those numbers rank in the top one percent of Overwatch players, according to MasterOverwatch, but that’s a pretty big discrepancy for players who contributed to winning the exact same games against the exact same opponents.

So what happened here is controlled test where a team of 6 players only ever played with each other and necessarily had the same win/loss record against the same opponents. Because they ended up with different ranks, it looks like individual performance really does matter in this equation. There could be some other explanation maybe, but it's highly likely that their individual performance metrics is what explains the difference in ranks.

Microsoft TrueSkill

Generalizing Elo into a system that handles team games isn't new. That was exactly the purpose of Microsoft's TrueSkill ranking system over a decade ago. TrueSkill intentionally and explicitly does NOT use any individual performance metrics. Their argument is that no matter what game you're talking about and no matter what metrics you measure to determine how well a given player did, it's necessarily imperfect compared to using only win/loss. The point of trying to guess if a player did well or not is how much they contributed win/loss, but the win/loss stat is the most accurate measure, they say. You'd introducing error by adding ANY other metric.

In addition to introducing error, you're warping incentives. For example, if you measure "damage done" as one metric, then it means players will attempt to maximize "winning AND damage done" rather than just "winning," which is not great. You can also very easily accidentally do a lot worse: you might accidentally give incentive not to play support heroes in a game where you really need support heroes on your team. (It seems this is already true in Overwatch.)

In many cases, it's almost hopeless to even devise a metric. If a character's role has to do with healing, you can't actually use how much they healed as a measure of much. If you did, it would penalize a healer on a team that played so well they didn't need as much healing. Or even worse is a character like Mei. Her ice walls can do a lot, her slow and freeze effects can do a lot too. But to actually quantify that into a metric correlated to win-rate? That's a huge error effect waiting to happen. My friend suggested the best metric how effective you were with her is to monitor the opponent's chat to detect how much they are cursing about Mei.

Yet another issue is that it's easy to accidentally create competition within a team for no real reason. For example, if number of kills is a metric that affects your rating, then your teammate killing an enemy that you could have killed essentially "stole" ranking points from you. That's clearly a bad dynamic.

I think Microsoft TrueSkill's reasoning makes sense here. It's a good case against ever using any individual performance metric when adjusting ranking points after a win or a loss.

Tangent: Another Thing about TrueSkill

You can skip this section if you're just here for Overwatch stuff. I just wanted to note that I'm not fully behind the REST of what TrueSkill does. The main idea behind TrueSkill is rather than assign a specific ranking number to a player, behind the scenes its assigning a bell curve probability distribution of what it thinks about your ranking. So two players might both be ranked in the 54th percentile (about tied) but the system strongly believes that's correct for player1 while for player2 it has a wide bell curve showing very low confidence in that ranking.

In theory, I see how this would allow it to converge more quickly to a good value. And in empirical tests done by Microsoft, it did converge faster than a more Elo-like system that didn't use the probability stuff. But...it just seems wrong anyway.

Specifically, if I beat a player way better than me (according to our ranks), I expect to go up a LOT of points. If I go up very few points "because the system is very sure of my current rank," that feels like total bullshit. And I have had this exact experience before. It's confusing and frustrating. As a player, I actually resent the system claiming it's so sure about me and dampening my rank gains when I go against its expectations. I think that feels debilitating and doesn't work well in cases where players really do get a lot better.

Anyway, I don't think Overwatch is doing this.

In Favor of Individual Performance Metrics Affecting Rank

Even though it sounds like a bad idea to count individual performance metrics when adjusting ranking points, is there some reason to do it anyway? Yes, I think there's something in the plus column here. The two main plusses I know of are "good outweighs the bad" and "assistance to escape Elo hell".

Good Outweighs the Bad (??)

Yeah it's imperfect to add any metric at all that gives you a bonus for kill:death ratio or whatever rather than just win/loss, but maybe it helps more than it hurts. For a character like Reaper, kill:death ratio is a relevant metric. It's not a perfect one for sure, but if you did amazing on this stat, chances are fairly likely that you played well. There are times where this indicator is wrong, but we might beat the baseline of "never count this stat" more often than we'd be steered wrong if we do count this metric for Reaper. I don't know that to even be true for real, but that's an argument someone could make.

I think the trouble here is that it's playing with fire. It's very easy to mess this up, so the downside is clear. If you mess it up, you get situations like the Cloud 9 example above where support players appear to be punished accidentally due to the workings of these behind-the-scenes algorithms. Is that risk really worth it? The upside is helping your rating converge to a good value more quickly, but maybe that's less important than avoiding these potentially very bad downsides.

Elo Hell

A related point here is about the urban myth of "Elo hell." This is the phenomenon where players with bad ranks in a team game can't rank up, even though they are actually much better than their current rank says. Their bad teammates make them lose so much that they can't rank up.

Is this a real phenomenon or just a myth? If it's real, then don't we actually place a lot of value on having individual performance metrics boost these decent players out of their unfairly low ranks? After all, they are playing great so they deserve some ranking boost.

I think Elo hell actually is real...sort of. Let's start by looking at the part of it that just can't be real though. If you are actually much better than your rank, then in a 6v6 team game you'd expect on average to have 5 "bozos" on your team, which is one less than the 6 bozos on the opposing team. So...just play enough games and your ranking will climb. Surely you are providing an advantage to your team in getting wins, because that's the very definition of what you being good means. If you find you can't get over 50% win rate, it sounds like you are actually as bad as the other bozos?

Mathematically, that makes sense. But let's look at this in the form of a story to truly understand it. You are playing as Reaper on a payload map. You decide to teleport behind enemy lines, then sneak up on various players. You're trying to flank them, catch them unaware, and get in kills. The more successful you are at this, the easier you're making it for the rest of your team. You aren't at the objective here because it's not your job. Your job is to make it so your teammates at the objective have a really easy job.

You get 3 kills in quick succession and you don't die. How are you doing? I think you're doing incredibly well. Your plan makes sense and your contribution is very large. If you had killed just one player, you might have pulled your weight at least, but by killing 3 there are now only 3 players left on the enemy team. Surely because of this, your team now has control of the payload.

You look at the payload indicator UI, expecting to see three arrows from your team pushing it forward. Instead, you see one arrow of the opposing team pushing it back. You wonder how this is even possible, so you go to the payload. What you see is a single enemy Reinhardt standing on the payload, totally unopposed, with no other players on either team in sight. Welcome to Elo hell.

You end up losing this game. The situation described is pretty unreasonable though. Your stupid teammates should have capitalized on the advantage you gave them and taken the payload. Instead they chased down butterflies or whatever and failed to get any real value out of your contribution. It's actually quite easy to imagine situations like this happening over and over such that even though you do amazing stuff, you still only in around 50-50. So in this sense, yeah Elo hell is real.

I think there's more to the story though. If we try to address this by rewarding you for your good individual performance and to get you to your "rightful" rank, we run into a couple problems. As stated earlier, if we reward you for number of kills, or K:D ratio, or damage done, we also introduce warped incentives. Now your incentive is something OTHER than just winning. Now you're fighting with teammates for kills, etc. So even if we wanted to help you out here, it's dangerous to do so.

But even beyond that, SHOULD we help you out? If we do, the result is that you are going to gain rank for doing things that...didn't help your team win? Yeah it SHOULD have helped your team win, but it didn't. It's a bit weird that you'll then keep playing the same, keep not actually making your team win (even though it's their fault), and we reward you.

Here's the real truth about this Elo hell stuff I think. The example Reaper situation above really is good play, it really is something that should help the team win...if you were a higher rank. The higher rank all the players involved, the more easily your teammates can convert advantage you provide into a win. If your teammates are so bad that they can't convert the advantage you gave into a win, then you should do some completely different things. Yeah it sucks that the thing you did SHOULD help, but in truth, it didn't. Work with what you have. Work with your generally uncoordinated or lower-skilled teammates and provide them whatever they actually DO need to win.

In Overwatch, I think what players generally need in these situations is "babysitting." What I mean is, it's probably more important to have few deaths and to generally be on the payload than it is to achieve impressive stats that "in theory" allow your teammates to be on the payload. You have to carry them, so you'll have to refrain from strategies that, at higher rank, are very good, so that you can provide for the most basic needs of your team. You don't have to do that in the exact way I said, but the point is if you play in the (sometimes pathetic) way that your team needs, you can contribute more to your team's win rate than if you play in an incredibly impressive way that they are unable to capitalize on, because they suck. Yeah that's frustrating, but THAT is the way out of Elo hell. Having the system give you a ranking boost for strategies that aren't resulting in a positive win rate isn't a good solution.

I don't now the reason Blizzard chose to have individual stats count towards rank (or even 100% that they do, but they sure seem to). I'd advise against it though.

Sirlin on Game Design, Ep 16: Overwatch

July 22, 2016

We discuss Overwatch, Blizzard's first-person shooter. It far exceeded our expectations and we attempt to explain how it manages to do that. This isn't a "review," but rather us analyzing the design decisions involved, trying to define the secret sauce of Overwatch's success.

Hosts: David Sirlin and Matt "Aphotix" DeMasi

Overwatch's Competitive Mode

July 3, 2016

Dear Jeff Kaplan and the Overwatch team,

I think your game is great and it's not lightly that I rate it a 10 out of 10. Respectfully though, I think you're a bit lost in the woods on how the competitive mode should work. That's understandable because it's a complicated problem that's nearly unsolvable given your constraints.

While this post by Kaplan is excellent about sharing thoughts from the dev perspective, I'm concerned about the specific thoughts laid out there.

Draws

"right now we’re exploring ways to allow for matches that would otherwise result in Sudden Death to instead resolve in a draw where neither team wins or loses." —Kaplan

You should not consider adding draws to the game. In a tournament setting, it's simply not acceptable. For single and double elimination tournaments, a single winner must advance for the tournament to work. In a swiss tournament (especially if there is a cut to top 8 for a single elim portion), every ounce of draw that exists causes problems. I think explaining what those problems are is beyond the scope of my post, though I'm happy to do it if needed. For my own tabletop game tournaments, the rules make match draws impossible (though game-draws within a match are still technically possible, since it's match-draws that are the real issue).

Anyway, draws range from bad news in swiss to literally infeasible for single and double elimination tournaments. If you implement them, then tournaments will use some other system, and probably one you won't like. You should feel GOOD about the format used in tournaments though, and it's best if they use the same format you come up with for the in-game competitive mode.

The Fundamental Issue

The reason why this is all so hard, as you know very well, is that a competitive format wants an ODD number of rounds, such as 2 out of 3 or 3 out of 5, but the asymmetric modes you've created want there to be an EVEN number of rounds so each team gets the same number of chances on offense and defense.

The other issue, as you well know, is that there is a Venn diagram of "what actually works" and "what people will accept." We have to find the intersection and unfortunately reject things that "actually work" if people won't accept them. Which brings us to the first try you had at a competitive mode during the beta.

The King of the Hill Tie-Breaker

People kind of didn't like this, but your dev team thought it was good because it worked. Actually, you might have drawn the wrong conclusions from this? People had various different objections as you laid out in your post, but consider these two issues:

1) It shouldn't be "sudden death"
2) People don't like that it's "a different map"

The King of the Hill tie-breaker was framed as "a tie-breaker" and "sudden death" but it really shouldn't be. Remember that the fundamental problem is that a competitive format wants an ODD number of rounds, so it's probably best if our solution has that. That means the king of the hill thing is "the third round", NOT sudden death. It should be as long as a full round and given equal weight.

One complaint I heard was that some players thought it was stupid that they might barely lose round 1, then win round 2 by a landslide, but then lose the match by barely failing at the tie-breaker, and it felt stupid. My initial reaction to that train of thought is that the complaint itself is stupid. After all, in that example, the complainer did lose 2 out of 3.

There's an important principle that "a win is a win." That means it's generally bad to count "different kind of wins" (such as win by a lot or win by a little) because it's too game-warping and generally leads to crappy dynamics. Strategies that barely win should be just as viable as those that happen to win by a lot sometimes. That alone is reason enough to make sure that there aren't extra stats attached to a win, but even beyond that, if you do actually win, it feels really crappy to be penalized that it "wasn't by enough."

So the complaint here, at first glance, it stupid because it's actually advocating the opposite of "a win counts as a win." But looking more closely, the real source of that complaint isn't that. It's that round 3 was not a real round. It was way too short and felt unfair to lose based on its outcome. I think this particular complaint evaporates if you frame it as a real, full round 3 (and design it to actually be that, too).

Another complaint is that it feels bad to be transported to a different map for round 3. Yeah ok, that's valid. Have you noticed that no one complains that you're transported to different maps over the course of King of the Hill rounds though? Each one of those takes place on a totally different map, but the trick there is that the graphics and theming make it FEEL like it's "all the same map." It's all "Nepal" or "Ilios" or whatever. Maybe if you made king of the hill maps that looked like they were part of each other map, it would have gone over better (for example, the king of the hill version of Dorado, etc.)

If you made round 3 a full round and you themed it so it felt like you were on "the same map," there are still other problems that remain. First, it's a different game mode still. And second, some people think king of the hill is too chaotic and wish that it wasn't what round 3 was about.

Yeah ok to both of those, but our constraints are so big here that maybe that's the best course anyway. Perhaps you could devise some OTHER symmetric round 3 that isn't king of the hill if you wanted to address those.

Current Competitive Mode

Moving on to the mode that's in the game now. This one uses an asymmetric mode for the "tie breaker". As you have seen, people don't really like that. I'm totally with you that it can work and be balanced on the razor's edge of 50-50 fairness. I have no doubt about that. But we're dealing with people's perceptions, and it might be they will just never accept this thing about a coin flip that then puts one team on offense for the final round, even if it were fair. So for this reason alone, you might have to abandon the current system.

But the point I'd like to make is that there are two OTHER problems with the current system.

1) The tie break system is really inefficient.
2) It's game-warping

Remember that the entire thing we're hoping for here is a "round 3" so that we can determine a winner if rounds 1 and 2 fail to do so. But This current system adds a round 3 AND round 4, basically. Then it can sudden death after that? That's a long way to go, having 5 different starts, when the thing we actually want is 3. Also, it's just pretty confusing. I watched a streamer who had been playing *8 hours* straight of competitive mode, and he failed to understand if winning or losing a certain round was going to determine the match win or loss, or if there'd be more gameplay after. I confess I also couldn't follow it at first either.

About the game-warping thing, consider the stopwatch method. That's the thing where team A completes a round in 4 minutes 20 seconds or something, then team B has to beat that time. You don't like that. I don't like it either. It's violating the "win is a win" concept and it's favoring certain strategies in a boring way. Strategies that are meant to waste time without killing are X powerful already, but their power level increases quite a bit if the clock is very short. In other words, wasting time without even killing the opponents is naturally sort of good, but usually it can't go on forever. Opponents can eventually overcome it. But if there's only 2 minutes on the clock to begin with, it's a huge boost to those lame tactics. You don't really want your game to be all about Tracer harass time waste + double Winston time waste.

The current in-game competitive mode HAS that same crappy property of stopwatch though. That's what the "time bank" system is. If you complete an objective very fast, but your opponents complete it slowly, then round 3 and round 4, one team will have to play at a huge time disadvantage. So "a win was not a win" there, and also we get game-warping surrounding time-wasting techniques.

Basically, the current system is falling pretty far short of just having a round 3. If it did that, it would shorter, it would be less confusing, it would preserve "a win is a win", and it would not be game-warping.

As an aside, I'd like to say that the overtime system used by all modes (an extra bit of time if anyone on your team is on the objective, even when the clock ran out) is fantastic. It's exciting, it feels good, it allows comebacks, it prevents "lame duck" situations where one side can't possibly win in the time remaining, and allows more possible strategies by not overly favoring those that try to win fast. Thumbs up on all that.

Minimizing Tie Breakers

Another concept here is minimizing how often you need a tie breaker in the first place. I saved this for last because I think it's the LEAST important part. Unless tie-breaks are minimized to literally 0%, it means we still needed the entire discussion we just had anyway. So let's consider these two cases:

1) We actually do minimize tie breaks to happen 0%
2) We don't go down to 0%, but we still want to minimize them as much as we can.

To have tie breaks happen 0% of the time, that means after round 1 and round 2 are played, we'd always know the winner. The stopwatch system is the obvious way to do that, but it's just not a good solution. Anti-climactic, usually not great to watch, and game-warping. But...there kind of isn't another way? If both teams complete the objective, and you can't decide a winner based on time, and you can't call a draw, then you'd have to break the tie based on some other "score" which is always going to feel wrong.

Moving on to minimizing tie breakers (but not forcing them to happen 0% of the time), you're actually kind of stuck here too. There'd be less need for a round 3 if payload maps were 2 or 3 times as long (or the payload traveled 2 or 3 times as slowly). That way it's far more likely that tie can be broken based on how far you get. That's kind of immediately bad though, because your maps already have around a 50-50 win rate on offense vs. defense across your player population. So making the competitive mode into like 10-90 (favoring defense) intentionally just to minimize the chance that both teams will complete the objective is really screwy.

In other words, you're kind of destined to forever have at least as many ties as you do now from the specific case of both teams completing all objectives. The only real way to make headway here is to minimize the ties in the cases where NEITHER team completes the entire objective, and you've already done that. Incidentally, this violates the "a loss is a loss" principle, but somehow that seems more acceptable than violating "a win is a win," so I have no objection here. While getting a win (even by a narrow margin) feels like it should "fully count", if you get "half a win" (aka, a loss) against someone that got even less than that, it feels legit to beat them.

Closing Thoughts

Here's the TL;DR

Draws are bad and should not be considered.
You're probably forced to have a symmetric way of resolving ties, like it or not. It could be king of the hill, or a new thing you create.
That symmetric thing should be a full round 3, not a "tie breaker" or sudden death thing that's really fast. It should have the same duration and weight as other rounds.
It should use the same map graphics as the map you played in round 1 and 2, even if it works differently.
The current system has an asymmetric resolution (that people don't like, even if it works), is inefficient at producing a winner quickly, confusing, and also game-warping.

I'm sure you guys will figure out something good since the rest of the game is so superb. It's a difficult problem.