Future of wireless VR: A back of the napkin look ahead
Post by Jeff Clark, Senior Product Marketing Manager, AT&T Developer Program
Virtual Reality (VR) is here in a big way. What was once just a promise a few years ago has grown dramatically in the past few years. There are more devices available, amazing content is being created to take advantage of the platform, and new use cases for VR are popping up everyday. After we all take a moment to appreciate how far VR has come in a relatively short period of time, it’s important to look at how far it still has to go. Let’s consider what needs to happen to content in order to make VR a truly wireless experience.
The Difference Between Prerecorded and Dynamically Generated Content
For the purposes of this blog, let’s separate VR content into two broad categories: prerecorded and dynamically generated. These are of course generalizations, and we can all think of both hybrid situations and things that don’t fall into either camp. But for our purposes these categories highlight how content differences can create different strategies for wireless transmission. Let’s go with these two.
Prerecorded content consists of 360 still shots, videos, and other content of a predetermined experience for the viewer. While the viewer can look around, their ability to move around is at best limited to moving forward and back through the video, or jumping between still shots. This doesn’t mean prerecorded VR can’t be both powerful and moving. A helicopter ride over Manhattan or a walk through a rainforest can be an amazing way to use VR to see a part of our world that would otherwise be out of reach.
With prerecorded content, we have the data in hand before the viewer steps into VR. This gives us a lot of options when it comes to a wireless VR connection. While you often have more raw data to contend with when dealing with prerecorded content, knowing what that data is creates more options. Take a short 360 3D clip, for example, even though the viewer is only looking in one direction at any one time, you still have to handle data for the entire 360 viewable area. Since you know what all the data is ahead of time, you can work with the data before the viewer even decides to view the clip. By having the entire content up front you can more easily compress the video without causing artifacts, or simply load the entire thing into the VR appliance ahead of time.
Dynamically generated VR encompasses many simulations and games. It focuses on situations where what the viewer sees is created by a graphics system as the viewer views it. This allows the viewer a lot more freedom. With dynamically generated content the viewer can have the freedom of movement to duck behind a boulder or walk around a 3D blueprint. The VR environment can react to what the viewer does whether it’s a zombie shambling around a boulder or an engine coming to life on a blueprint.
Dynamically generated content is in some ways presents a much more interesting problem. While the amount of raw data may be less, as you only need to worry about where the viewer is looking from moment-to-moment, the potential of the viewer to change what will happen at any time creates challenges. Does the viewer dodge the zombie or climb on top of the boulder, rotate the 3D blueprint or zoom in, etc.? Because you don’t know what will happen next, you can’t use simple pre-loading of the content or use long duration data compression techniques.
VR By the Numbers
Let’s look at some numbers and see what this means for wireless connectivity
If a high-end VR rendered image today is 1500×1500 per eye and we display 60 frames per second with 16-bit color* you are looking at something under 5Gbps (gigabit per second) of raw data. With basic 10X compression you can get the 5Gbps down to ½ a Gbps. While ½ Gbps (or 500 Mbps) is not a trivial amount to transmit, it is well within the realm of what is possible with current generation Wi-Fi or next generation 5G cellular.
But as anyone who has spent some time with today’s VR options can state, we still have a ways to go. VR resolution could use an increase. A virtual object with text that should be readable at three feet away is pixelated to the point of being unrecognizable as text. A lot of peripheral vision is lost and the 60 (or in some cases 90) frames per second is still low enough to cause some dizziness when viewing VR material.
So let’s envision what’s next. To pull some numbers lets imagine that we have an 8000×8000 image per eye to give us invisible pixels and a wide field of view. Reasonably accurate 24 bit color, and 120 frames per second to keep it smooth.
If we run the numbers again you are now looking at something like 37Gbps. Here is where the difference between prerecorded and dynamic become more pronounced. With prerecorded content you have the ability to compress not just individual frames, but the entire stream, seeing much more benefit from compression. The viewer can simply load the entire experience ahead of time into the VR device. Dynamic content provides more challenges. If pre-loading isn’t a possibility, and the dynamic nature of the content is limiting compression then there is a lot of data to move in front of the viewer quickly. And with best case wireless transfers rates for Wi-Fi and 5G being in the single digit Gbps the ability to simply transmit the entire dynamic image stream is limited.
How Do We Tackle Making VR Available with a Wireless Connection
One way is to render the image on the VR device, so that we change from sending the full image the viewer sees to the wireframe updates that are used to render the view. The challenge here is that even with today’s best consumer graphics options rendering two 8000×8000 images (one per eye) at 120fps with any kind of complex image is daunting. The prospect of moving that kind of capability to a lightweight device is years away. Backpack PCs and swivel chairs with built-in PCs have tackled the problem of needing a physical line connected to a stationary point by either moving the (somewhat heavy) PC on your back or running the wire through the chair to eliminate tangles, but both seem limited to a small segment of users willing to deal with the additional complexity.
Another approach is smart eye tracking and rendering. With this approach the eye is tracked and full resolution imagery is only rendered for the small area (2-3 degree) of our vision that we use to see the sharpest detail with gradually diminishing detail further out. Experimentation would be needed to see how much resolution can actually be saved without visual sacrifices, but the savings could be significant perhaps even in the 10x range. But the ability to translate eye movement into viewable results nearly instantly will be both a processing and cost challenge.
Finally, better compression is an obvious answer. Compression often works best 90 percent of the time, with the rest having less success. The problem with VR is that those less successful times can result in nausea creating slips that no one wants.
So what will the future hold? I suspect that we will see various versions of the above. Smart eye tracking seems like it may end up as a high-end solution that trades complexity and perhaps price for a quality wireless experience. Compression could be pursued on the lower end where some glitches are more acceptable. In the longer term, however, if processing power continues to rise at historic levels that could be the ultimate answer. Of course, it could be something totally different. I look forward to seeing what you think in the comments below.
* Generic VR resolution based on general purpose recommended values from “Introduction to Mobile VR Design”
For more on articles on AR, VR and all things video, see our new AT&T Video and VR site.