5 min read

Creating a matchmaker for your multiplayer game

Don't test in production. Test in simulation!
Creating a matchmaker for your multiplayer game
Photo by JESHOOTS.COM / Unsplash

I'm Glenn Fiedler and welcome to Más Bandwidth, my new blog at the intersection of game network programming and scalable backend engineering.

A good matchmaker is vitally important to your multiplayer game experience.

But it's difficult to test your matchmaker properly ahead of launch. Testing often misses problems that arise with real-world distributions of players, and you really need to be 100% confident that your matchmaker will do a good job from day 1.

In this article I present a matchmaking simulator built on player data captured from a real game. This data set reconstructs one virtual day's worth of player joins across the world with the correct distribution of (latitude, longitude) according to the time of day – accurately reproducing the timing and pattern of player joins you'll experience at launch.

Goal

Create a matchmaker that quickly finds low latency matches. Ideally, players find a good quality match almost instantly.

Context

The game plays identically from 0 - 50ms RTT by design, so there is no point preferring a 10ms server over 50ms. From 50 - 100ms the game still plays well, but 50ms or below is preferable. Above 100ms performance starts to degrade.

The server fleet for the game is distributed across datacenters around the world. If there are multiple server hosting companies in the same city, each are considered to be logically distinct because they can have different network performance.

There are 4 players per-match.

Intent

  • First match players to servers near them with <= 50ms latency
  • After 10 seconds, if no match is found, expand to servers <= 100ms latency
  • After another 10 seconds, give up and donate the player to any match that needs a player. There's no good match for this player and the priority is to just let them play. Alternatively, you could let them play against bots. This should be very rare.

Algorithm in Detail

Each datacenter has its own matchmaking queue. Players can be in multiple datacenter queues at the same time. Every second the set of datacenters are walked in random order.

For each datacenter, the matchmaker shuffles the queue to avoid the same players being matched together repeatedly (more should probably be done to avoid this), then walks the queue and assembles players into groups of 4 players. After all datacenters are processed, any left over players feed into the next iteration.

For each new player entering the matchmaking pool:

  • Set the player state to NEW

For players in state NEW:

  • Find all datacenters less than ideal threshold (50ms). If one or more datacenters are found, go to the IDEAL state.
  • If no datacenters are found under ideal threshold, find all datacenters less than expand threshold (100ms). If one or more datacenters are found, go to the EXPAND state.
  • If no datacenters are found under the expand threshold, go to the WARMBODY state.

Players in IDEAL state should quickly find a game and go to PLAYING state.

  • Any players that don't find a game within 10 seconds, go to EXPAND state.

Players in EXPAND state should quickly find a game and go to PLAYING state.

  • Any players that don't find a game within 10 seconds, go to WARMBODY state.

Players in WARMBODY state donate themselves to all matchmaking queues. They just need to play somewhere.

  • Look across all datacenters for games that need extra players to start, regardless of latency, and volunteer the warm body.
  • Any players that don't find a game within 10 seconds, go to FAILED state. Alternatively you could set these players to play with BOTS.

Players in PLAYING or BOTS state go to BETWEEN MATCH state at the end of the match.

Players in BETWEEN MATCH go to NEW state once the time between matches has elapsed (30 seconds).

IMPORTANT: Break up players at the end of each match and send them back to matchmaking. Otherwise, little islands of players are formed and when you are in the down sloped portion of the CCU curve, new players won't find matches.

Estimating Latency

The key input into the algorithm is the estimated round trip latency between a player and each datacenter. This is what guides players to low latency datacenters, so it's important that it's accurate. If the input is not accurate, the matchmaker will send players to the wrong place.

One option is to use the haversine formula to calculate the distance between the player and datacenter, then use the speed of light in fiber optic cables (2/3rds speed of light in a vacuum) plus some fudge factor to estimate the round trip time in milliseconds.

The problem with this approach is that in many cases the best datacenter isn't actually the closest. A great example are players in Peru, for whom the closest datacenter by distance is often São Paulo, but since there are no fiber optic cables directly through the Amazon rainforest, it's not actually the lowest latency datacenter for them to play on.

One way to get around this is to put ping servers in each datacenter you host game server in, and send low frequency pings to all ping servers on game startup. This way the round trip time can be determined to each datacenter in just one second.

But even this approach has problems. The route packets take over the internet often depends on the hash of the source and destination IP addresses and port numbers, and the game server the player will ultimately connect to has a different IP address and port from the ping server in the datacenter.

This can lead to the best datacenter for that player being excluded because it has false positive high RTT due to a bad route between the client and the ping server that doesn't occur when the client connects to the game server. Conversely, the ping server RTT might look good, but when the player connects to the game server, the route has a much higher RTT. See this article for more details on why this happens.

Because of this, my preferred approach is to calculate latency maps as greyscale image files per-datacenter in a batch process run daily over the last 30 days. This lets you look up the RTT to a datacenter in constant time, and because it's an average, the RTT value isn't subject to internet fluctuations.

This way the player is always sent to the datacenters with the best chance of having low latency. I always pair this with a network accelerator like Network Next to fix any bad routes between the game client and server.

A combined latency map. Grey scale color [0,255] is latency in ms to the nearest datacenter

Implementation

You can see the full source code and dataset for the matchmaking simulator here.

Results

It takes an average of two seconds to find a match, with 30-40ms average RTT depending on the time of day. It seems that the matchmaker is working well.

To confirm correct operation, let's visualize the set of active players in matches, so we can verify the distribution of players is correct. It looks good!

Click on the image to watch a timelapse across several days

Closing thoughts

While developing the simulator I found that for the same set of players joining, the peak CCU is incredibly sensitive to the % chance that a player will play another match.

What can you do to increase this % for your game?