Reality check the Metaverse

By Marek Rubasinski

Minecraft, PUBG, Roblox, Fortnite, Apex, COD Warzone – the user numbers and ramps behind the growth of these games prove a lot of things, not least that Polystream’s New Year’s predictions are coming true. They demonstrate that we are moving into the era of Generation G. A world dominated by the expectation of being entertained not just by sit-back shared experiences like Game of Thrones, but by mass-scale social interactive experiences set in your favourite universe. This week it is the COD’verse(™) that tens of millions of Generation G want to jump into and play. If they’re going to lean back, it’s probably to watch someone else play. Then maybe jump in and play with them.

If you’ve not been reading up on the Metaverse concept then Epic CEO Tim Sweeney has been exploring this for a while; as have Scott Broock & Mike Seymour who have recently published great articles to read, ‘Game Engines are the New Reality Engines’ and ‘Ready Performer One’. These games are all clear leading indicators that the predictions they make about the coming Metaverse are absolutely on the money. This is happening. We are at the edge of a real-time 3D revolution: people are ready, the ideas are there, the tools like Unreal Engine and Unity are there, and you can feel the pent-up creativity. Just yesterday Manticore went into public alpha. A tool specifically created to democratise collaboratively building real-time 3D worlds and sharing them. Not just: ‘build it and they will come’; more like: ‘give them the tools and they will come and build it, then share it with millions’.

HOWEVER, I’m going to make another prediction: it’s all going to hit a massive wall and stall horribly. Disappointingly, just when a completely new paradigm and a huge amount of innovation is about to be unleashed, something is already going wrong. This stall is going to happen not only in games, but also in education, training, social engagement, retail, and productivity. Anywhere where real-time 3D is increasingly at the heart of the experience.

Reality Check

Right now, 3D interactive experiences are delivered in a way that has not changed since floppy disks were glued to the front of magazines (some of you might need to look that up). We still rely heavily on locally installed games and applications. But as these experiences get larger and more complex, we’re increasingly seeing those locally installed games and apps rely on the cloud to do the fun stuff. In this case, playing together in a big group in a virtual world. As a delivery model, I agree with Xbox’s Phil Spencer and others that it still has legs – probably about ten years. Things don’t change that fast – you can still just about buy CDs.

‘marvel to deliver an experience of 150 players in the same virtual space’

But it has severe limitations when it comes to creating the mass-scale interactive experiences that will engage Generation G for the decades to come. Going back to COD Warzone, Activision’s Infinity Ward has worked a netcode engineering marvel to deliver an experience of 150 players in the same virtual space (although Jason West, Infinity Ward, I would love to see an experiment where everyone agrees not shoot each other and tries to all congregate in the same space on the map ;-)). Companies like Improbable, Hadean and the big cloud players are working on tech that will unlock much bigger numbers that can co-exist in real-time in a virtual space – but there’s a catch. Those things will then have to run completely in the cloud, and by that I mean there can be no locally installed client version like we are used to because doing that would limit what you could do in the cloud. That means once you go beyond the limits of what is achievable with netcode, you need some way of getting just the results (i.e the visuals of what you are doing) down to you the user. And because it’s an interactive experience it is a unique stream for each player. This isn’t about sending one view to lots of people like Twitch. It’s basically a 1:1 model – your individual experience streamed just to you.

For years we’ve had a tech that can do this at a small scale – taking traditional video streaming approaches and using them for 3D content. Sometimes called ‘pixel streaming’ or ‘frame streaming’, it uses a server in the cloud that has a GPU to turn everything a game or app does in terms of drawing graphics into a video. That GPU renders the graphics, an encoder captures this, compresses it, and sends it to you as a video. To all intents and purposes, you are renting a complete games machine or remote desktop in a data centre – and depending on the use case you either see the whole desktop or just one window of the game or app you are running.

It is the basis of almost every cloud gaming and 3D application streaming service and platform you would have seen or heard of and it totally works. Stadia, xCloud, GeforceNow, AppStream, Citrix HDX are all built like this. And it also has a massive problem – scale. You can’t effectively scale this video-based approach to 1:1 experience for 3D real-time content effectively. The limitations driven by available hardware, energy, and budget make it utterly unviable as a way of scaling real-time interactive 3D from the cloud. I’ll explain below with some numbers…

Let’s talk numbers

Let’s stick with just games for now (the numbers get even scarier when you talk about productivity, education, and training). I’m going to stick my neck out here and take a stab at how many GPU server ‘instances’ (the parlance for what in a data centre can serve one user a stream at a time) each of these services has:

So, outside of using AWS AppStream or Azure remote desktop GPU instances meant for remoting CAD, we have about 160-250k available. Let’s remember each one is highly proprietary and supports a closed ecosystem. Stadia servers are Linux only and can’t run xCloud software, PlayStation servers can’t run PC/Steam games, etc.

If we go with the mean that’s 200,000-ish, across mostly four regions (you have to split by physical regions because of latency, Silicon Valley servers can’t stream to New York, etc.). So, each region could, if every server was lit up, support about 50,000 players at once. Across five largely non-interoperable platforms.

Let’s go back to the COD’verse.

  • Warzone gained 15,000,000 players in four days.
  • Split that by five regions based on where it’s mostly launched. That’s 3,000,000 per region.
  • Let’s assume a traditional gamerpeak of 30% of DAU in the evening equals a peak concurrency of 900,000 players per region.

For one game, per region, the demand is almost TWENTY TIMES more than the total installed base of GPU instances that could support it in a cloud gaming model.

Let me say that again. 50,000 reality vs. 900,000 players demand. For just one game.

Even if I’m off by 100-200% – that’s still a pathetic number compared to demand.

The way cloud gaming is currently delivered using technology designed for streaming video makes this channel a total and utter irrelevance for COD Warzone. And therefore for every other Metaverse-like experience that will come along as each one will get bigger, and faster to grow than the last.

The indicators could not be more explicit, that these types of experiences are the desired direction of travel of what people and creators want to do. And it is entirely out of kilter with what technology and platform providers are offering.

GPUs in devices deliver increasing quality performance

But it’s not all doom and gloom. Whilst Google, Microsoft, Nvidia, and others have been dabbling (and let’s be honest, at those numbers, relatively speaking it is dabbling) with graphics GPUs in the cloud a quiet revolution has taken place in plain sight.

Consumers have paid to level up GPUs in their devices. Year-after-year we have all been buying laptops, phones, tablets that are increasingly packed with graphics rendering performance. This has been driven by multiple supply and demand upward spirals. On the supply-side an arms race between Intel, AMD, Qualcomm, Apple. On the demand side, an insatiable desire for better and better screens from users and developers.

If you have bought a new laptop, phone or tablet in the last couple of years chances are it already has a GPU in it capable of drawing some pretty amazing graphics. Last year many of Intel’s line of standard Core integrated chipsets – which 6-7 years ago were considered a laughing stock in terms of graphics performance – shipped with about the same rendering grunt as a launch Xbox One did. That’s right. When you bought that normal laptop, you essentially got a free basic Xbox.

This trend is not slowing down – I don’t see any scenario in which device manufacturers start releasing things next year that are worse than this year. So what? Well – because consumers have spent, and continue to spend, Billions of dollars every year on better and better GPUs – the industry doesn’t have to. It just needs to build things in a way that uses them.

How do we scale?

The answer to our massive scale problem has been answered by a massive scale in better consumer GPUs. The answer is to stop using old technology that requires a dedicated GPU in the cloud. The answer is more consumers having good > better > best GPUs in their devices and the cloud being used together with this trend to release a perfect storm of scale.

Run the compute that powers the experience in the cloud. Draw the pretty pictures you need to interact with it locally on the device in your hand, on your desk or under your TV. Make the infrastructure and networks that connect the two elastic and scalable on-demand so you don’t waste anything when it’s not being used. Grow and make use of the edge to squash latency.

A perfect model of distributed compute and composable infrastructure that can support millions today, and in the future billions.

Simple when you put it like that, right? How hard can it be  😉