3 Essential UX considerations when initiating or joining a call via WebRTC
In the previous post in this series, we looked at some UX considerations when adding WebRTC-enabled features to an application. It’s possible and important to make these real-time features feel seamless and consistent with the rest of your user experience.
Next, we’ll go over the initial major steps the user will take with WebRTC: initiating or joining a call. What questions should you ask yourself when designing this part of the flow? What decisions should be made?
Starting a Session
Just as it’s important to consider which users can begin a voice or video call and where within your app they would, it’s crucial to think about how that interaction happens.
Voice and video calls are usually initiated and joined with a button click. What should that button say, and what information can it tell the user? The button’s text can indicate each user’s role in the call. “Begin call,” tells me I’m the caller, initiating the session. “Join call” tells me I’m the callee or one of many participants.
Let’s say a user can create a video call “room” that others can join at their discretion. Their button text might read “Create room.” Or, if a video call is by invitation only, the button text might specifically read “Call Sharon” (replacing “Sharon” with the recipient’s name, of course). The choice of words sets the user’s expectations for what they’re doing and what happens next. Be clear and specific.
Think about other cues that may affect a user’s expectations when they click to start a call. Does the specific color or style of a button mean something within your app? Is the functionality they expect consistent here? What existing patterns can you use to make the interaction smoother and more clear?
Beyond the consistency of experience within your app, how does this interaction compare with a similar one within the user’s device? A device that supports voice and video calls may have established icon or gesture conventions you’d want to mimic. This is more complicated because of the many devices on the market, but definitely consider the device UI and UX during your design process.
Speaking of device behavior, this is also the time to consider the browser-default prompt that asks the user’s permission to access their microphone or camera (as mentioned in the first post in this series). Preparing the user for this interaction is important and so is understanding each browser’s quirks. Some will, by default, persist the user’s permission, and some will only persist if the user selects that option (and it’s not always clear you can do that). Ask yourself how your interface and instructional copy will guide them in this crucial step. Also see Room or Google Hangouts for some great examples (the illustration below reflects Room’s approach).
Connecting and Waiting
Even though web-enabled calling is a more recent endeavor, people have been making voice calls for over a century. What patterns and behaviors do we expect because of our generation-spanning relationship with the telephone?
The dial tone, for example, is so familiar we don’t even think about why it happens. When we pick up the phone receiver, we hear the tone. It tells us things are ready and we are able to start dialing. Some calls may feature two different dial tones, each meaningful. Making a call from an office phone is a good example of this. The first tone indicates a call can be made to other phones within the office. If a specific sequence is dialed, a second tone will indicate a call can be made to phones outside the office.
When making a call via a web browser, there isn’t really much of a receiver to “pick up,” so how do we indicate the system is ready? When you think about traditional dial tones, the most useful indicator is when the dial tone itself is missing. Picking up a phone receiver to hear silence immediately tells us it’s not connected and we can’t make a call. What copy or UI elements might you show or not show depending on whether a user can make a call? As we questioned in the previous post in this series, what conditions need to be met for a “Begin call” button to show up for a user? What indicator can you give the user if their ability to make a call is blocked somehow (e.g. their Internet connection gets interrupted)?
A partner to the dial tone is the “ringing” indicator. There are two ringing indicators to consider: one for the caller and one for the callee. Commonly experienced as tones, they tell the caller the call was placed successfully, the other person is being notified, and to wait. They tell the callee someone is trying to contact them and to take some action to accept the invitation, like tapping the green button on a smartphone and saying “Hello?”
This feedback for the user is crucial. How can you communicate that something is happening? How do you convey to the user when it’s time to wait and when it’s time to take action? Write proper instructions, include reliable loading animations or tones, and make your calls-to-action specific to help clarify these expectations for your users.
Another aspect of traditional voice calls and ring indicators is duration. Over time we’ve established the expectation that the longer the ring indicator goes, the less likely it is someone will answer. Or later on, it is more likely that the call will be intercepted by a voicemail system of some kind. So for your WebRTC-enabled calls on the web, how long can a person sit on a call waiting for another person to join? Do unanswered voice and video call invitations expire after some period of time? Is there some way for a person to automatically respond to a call when they are away?
It’s good to set clear expectations for a caller while they’re waiting for another person to join. This is also an opportunity to delight the user with something unexpected: Maybe a riff on traditional “hold” music or a fun, time-killing game (like Lander, playable while waiting in a Talky.io room).
Preparing for What Comes Next
Just after a user starts a new call or joins an existing one is a perfect moment to prepare them for what’s coming next. Introduce layout and navigational controls early so the user can become familiar with the interface; don’t wait until others have joined the call. Provide clear descriptions and indicators of what should be happening or what actions they need to take. Consider providing a “hair check” step before joining the call. A “hair check” prompt is what you might expect: an opportunity to check the messiness of your ‘do’ before joining a video call, saving you from a possibly embarrassing situation. Plus, there’s even more value you can get from this interaction.
The hair check step shows the user their camera is working properly. It’s a good time to check the microphone is working too. It’s an opportunity to give the user the chance to preview and change their information that will be shown during a call. For example, they may want to adjust their camera to better point at their face or they may want to hide their video or mute their microphone before joining. If they have multiple device cameras or microphones, give them the option of choosing which ones to use. If a user’s name and avatar are tied to their account, show a preview of how that will appear. If these aren’t available, give them the opportunity to type in their name or choose an avatar photo.
We’ve looked at some scenarios and UX considerations for users initiating or joining a voice/video calls. The clarity and ease of this workflow is crucial in setting up a positive call experience. In the next couple of posts we’ll discuss UX patterns and best practices for the rest of the call workflow, troubleshooting, and handling connectivity issues.