Finding the Right Augmented Reality SDK
- Augmented Reality ,
- Game Development
Guest blog from Patrick Martin with @gosphero:
I’m a software engineer for Orbotix, the makers of Sphero, the robotic ball you control with your smartphone. I’ve spent about a year working on augmented reality games including Sharky the Beaver and our newest, The Rolling Dead. As a game developer, I hope to reveal a bit about the benefits and drawbacks of current AR technology from the perspective of a user of the various available SDK’s.
Augmented reality is an exploding new field where normal interactions are enhanced with digital data. As mobile devices get more powerful, we are able to process more real-time data, including video feeds, and use it to improve the experience of reality. The overarching goal of augmented reality technology is to point a camera in the real world and place computer generated metadata into it as seamlessly as possible. The end product should feel like a video game has come to life around you.
To achieve this goal, a number of researchers are looking into algorithms that generate clouds of points, typically using a technique called simultaneous localization and mapping, to maintain a contiguous approximation of a viewer’s position and movement in the 3D world. You can find several off-the-shelf SDK’s that will even work with mobile devices, although they pose some drawbacks. The first is the training phase. A still image has almost no data about 3d geometry to pull out and your dataset will grow dramatically more detailed in the first few moments of simulation (assuming a user knows what kind of motion to make and has a suitably detailed video feed to process). Once data is generated, it can be hard to figure out a useful reference frame. On top of that, the reference frame tends to be anchored barycentrically to several points in the cloud, and may drift as they pop into and out of existence or as the points are refined. Complicating the matter of drift, most cellphone cameras do not deal well with fast movement, so you may lose all your reference data if the user moves too fast or shakes the camera (if you’ve seen fast moving cellphone camera footage, it is hard for even a human to track). Finally, the algorithms tend to be very intensive. You have to examine a video frame as it’s streaming in, typically at 24fps. This means that you must examine (ideally, at 720p) 1280×720 pixels at 24bpp (3 bytes) every 420 ms (that’s about 6mb of data per second). The first thing most implementations will do is knock down the resolution and eliminate color data, but you’re still looking at a lot of information in a very small amount of time.
This is where implementations will split into two camps. The first (and easiest to develop) is to just fall back to auxiliary sensor data using the camera feed as a fancy backdrop. You may have played with a GPS based app in which developers overlay names of places or directions over a video feed. For games it’s popular to use just the gyroscope and pretend that the user is twisting around and shooting flying baddies as they swoop in from the sky. The obvious drawbacks are that the sensor data often lags behind the video (or vice versa) and typically needs to be smoothed out. This is one reason why enemies tend to fly in – they aren’t anchored to anything so we accept that they’re swooping and bobbing. The benefits are that the user can swoop around wildly and the simulation will remain stable (you won’t lose tracking data because the video gets too blurry), and you’re not necessarily stuck in place. The accelerometer data won’t track your movement very well and GPS won’t be fine enough for running around in a game, but it won’t look too broken if the enemies are dodging around you.
The second camp tends to rely on something we call a “fiducial.” This is essentially an interesting piece of data that’s synthesized in a way to make it easy to track. Various SDK’s let you generate and follow these fiducials. They’re very useful as you don’t need anything but the camera to follow them, as they typically lay on the ground or some known plane. This allows you to determine where down is and derive your orientation to them by constructing an inverse projection transform. Generally, this is cheaper than a point cloud as there is typically a small subimage for the “interesting” area and the geometry. They are also typically asymmetrical so you can figure out where you are standing around the data, and since it’s part of the video frame it’s easy to synchronize your 3d data to them. Although they can be synthesized to be easier to track (so you can quickly re-acquire them), fast camera movement can render a game temporarily unplayable. You also are stuck looking at the same spot or the simulation fails.
One nice synthesis of these two techniques is Sphero. We provide an augmented reality SDK with Unity support alongside our normal developer kits. Because Sphero is a glowing, round object, it is really easy to track through the camera on your device. Circles actually don’t have orientation data built into them, so we have to fall back to using the device’s gyroscope. The two drawbacks to this are that 1) the simulation will drift as your gyro drifts and 2) when the Sphero starts moving, you have a training phase similar to a point cloud where the reference frame must be aligned between what the camera observes and what the ball observes for movement. This does lead to the biggest benefit of this fiducial – it’s able to track its own movement with surprising accuracy. It’s a fiducial you can drive around and follow, and the simulation will move along with you. This is remarkable because even if you put a QR code on an RC car, you’d have no way to figure out how the car is moving, so the entire world would travel with the car, not just a character on the code. The simulation does become less stable the further you get from Sphero (this is where the flying baddies come in, or in the case of our app Sharky the Beaver, coins with wings on them).
One final piece of augmented tech is Cast AR. Developed by former Valve engineers, it also relies on a fiducial (I’m sure you’ve noticed developers favor this line of thinking). In this case, the fiducial is a retroreflective film with a series of LEDs positioned around it for tracking data (fixing bright dots in known patterns makes it really easy to track a fiducial). Unfortunately, I can’t speak specifically about how it works, but the twist here appears to be a set of 3D glasses that project images onto this surface that are only visible to the person wearing the glasses.
Your choice of AR technology will depend on your needs. Software companies such as Nintendo ship trading cards to anchor their simulations, whereas Oboto and Minecraft Reality get away with point cloud data. I’m a fan of Sphero not only because it provides me with a paycheck, but also because it’s physically in the same space that we are and is limited by the same things that limit us. It is truly a mixed reality gaming experience. Sphero can’t pass through a wall and will go slower uphill than downhill while maintaining autonomy (it’s mobile and can roll and act on its own). Creating a simulation with a single mobile focus point is made easiest with this robotic ball. If you try to pick up Oboto, it will drift and sometimes even anchor to something that isn’t your hand, whereas Sharky the Beaver will reliably stay where you expect him to – right on top of Sphero.