The Evolution of Multimodal UI
It is an obvious statement that technology moves at a rapid pace. You can look no further than how differently people now interact with their mobile devices as opposed to the recent past. Touch has become the primary method that people use to control their devices; but you can already see that evolving, particularly as wearables grow in usage.
First, you have seen touch move initially from simple touch control to various forms of discriminative touch—that is, devices starting to incorporate movement, pressure, and vibration to provide the user flexibility and feedback. Across the board, mobile devices seem to work well and users seem satisfied with the UI, but that just makes the emergence of wearables more of a challenge.
One emerging modality is movement. I think most are familiar with this from either the Microsoft Kinect or Nintendo Wii. It has already moved past games. The Samsung Galaxy 4 also supported movement. There is also a new product from Leap Motion that connects to a PC or a Mac that also understands and interprets hand motions. Additionally, at our recent Vegas hackathon we had one finalist who used the motion controller in the Samsung Galaxy Gear to create an application that teaches sign language. The way it worked was the developer had a sign language expert go through the motions for a word and then record it into the device. To instruct the student, the application would share a word or phrase and then the student would have to mimic the motions exactly to demonstrate knowledge and then move onto the next word/phrase.
I think you will see more movement incorporated into UI (particularly with wearables), but I am not sure if people want to make exaggerated movements when out and about. Related to this, AT&T introduced an alpha API that we featured in our Developer Summit demo area on visual recognition. This demonstrated celebrity matching. We took a photo and matched it to the person’s celebrity doppelganger (I was Al Pacino, but thankfully a younger photo of him and not the HBO Phil Spector version). This clearly has usefulness with identity, but you will also see the capability evolve to a point where the device can follow your eyes and perhaps over time recognize blink codes or something to that effect. There are PC products that do this, but this seems even more useful with wearables (and enabled by better cameras) where the user can choose to control the device without exaggerated gestures.
A popular form of input is speech. It is primarily used by Siri or Google Now users in the virtual assistant use case. Usage is growing in a wide range of applications rapidly and this also seems a very natural fit for the wearables space. One concern with speech input is background noise so this will probably work best if the wearable is close to the mouth or is some sort of Bluetooth device. Custom speech contexts can make this work really well.
Finally, one area of particular interest to me is brain waves. At the 2013 AT&T Developer Summit hackathon, the winner used Necomimi Cat Ears to shut down their device if the person was too agitated (to prevent the user from picking it up and yelling at a spouse or boss). Since then, this field has seen some fascinating advances. At the University of Washington, they achieved the first remote brain interface where one person’s brain directed another person’s body. Additionally, a company called Thinker Thing created an object by thought alone using a 3-D printer. We will have to see what will be possible in the future to control wearables and other devices.