How to add voice to your app with Nuance Mix at the Shape Hackathon
Guest post by Kenn Harper, VP Devices, Nuance Mobile
First and foremost, we’re really excited to be here in San Francisco at the AT&T Shape Hackathon this year. You’ll find us hanging out at our booth, so definitely swing by to say hello, ask us any questions, and pick up some swag.
We’re in town to give all Shape hackers special access to our Nuance Mix web-tools to build custom voice and natural language interfaces for apps and IoT devices. What you choose to develop is totally up to you, so we invite you to get creative and start experimenting with our cloud voice and NLU technologies. If you need some hardware to go along with it, we’ve got you covered there as well with something that you can use during the hackathon. Just swing by and see us at our booth.
To help you get started, here are some helpful tips and tricks to keep in mind. Don’t forget to create your free Nuance Developers account on the AT&T Shape landing page here.
What is Nuance Mix?
Nuance Mix is our new platform that allows developers to build a fully customized speech and natural language interface for any hardware device or application. Available as part of the Nuance Developers program, Mix creates more opportunity than ever before for you to build highly customized voice and natural language interfaces. Mix.nlu is the Web tool available that makes this development possible, providing you with a GUI to build out a custom set of speech-enabled use cases.
At each step along the way, training guides, tips, and reference material available within the web tool will help guide you through the process. If you run into any challenges or questions, come find us at the event and we’ll be happy to help.
What is NLU and why should I use it?
Put simply, Natural Language Understanding (NLU) is the ability to correlate what a user says with the action the user intends to do. Your users will have countless ways of expressing their requests, and it wouldn’t be practical to try and code against every permutation. That’s exactly where NLU becomes so valuable. With Mix.nlu, you have full control to define these actions and decide what information or words the system interprets to make that decision. As your NLU model gets more robust, our statistical algorithms will then begin to recognize language patterns and constructs in your sample sentences, implicitly adding to the flexibility of your interface.
We’ve designed Mix to make this as simple as possible for you to get up and running, even if you have no prior background in speech recognition or NLU.
What type of things should I be thinking about when designing my NLU model?
Before we dig a bit deeper, broadly think about the following questions/steps for creating an app with speech functionality:
• What can your app do? – Think about what you want your app to do.
• What types of things will users say to your app? – Add sample sentences that users might say to instruct the app to do those things.
• What does the user mean? – Annotate the sample sentences with intents (what the user wants to do) and concepts (parts of speech that are meaningful to those intents).
• Train a simple model – Use your annotated sentences to train an initial NLU model.
• How well does it work? – Test the simple system with new sentences; continue adding annotated sentences to the training data.
• Test it with real users – Build and publish a version for users to try out; collect usage data; and use the data to continue refining the model.
What will users say…and what does that mean?
First, you’re going to need to map the words that your users say to specific actions in your application. That may sound daunting, but don’t worry, the Mix tool makes it pretty easy. To start, you’ll need to enter some sample sentences that you think people will use when speaking to your app and indicate what actions (or intents) those sentences should correspond to. For example, let’s say you’ve developed an app for ordering espresso drinks. Sample sentences for such an app might be “I’d like a tall vanilla latte, with low-fat milk,” or “How much is a large cappuccino?”
Along with your sample sentences, you’ll need to signify specific concepts and intents that are relevant by creating them in the tool and tagging them. This process is called annotating. Going back to the sample sentences above, the first example might make you a drink while the second might just provide a price. So you might be thinking, how exactly do I indicate what should happen? Well, here’s an example of what this would look like for the sentence “I’d like an iced vanilla latte.”
I’d like an [HOT_COLD_TYPE]iced[/] [FLAVOR]vanilla[/] [DRINK_TYPE]latte[/]
In the example, the intent of the whole sentence could be something we decide to call “ORDER_DRINK” and the concepts within the sentence are [HOT_COLD_TYPE], [FLAVOR], and [DRINK_TYPE]. By convention, the names of concepts are in all caps and enclosed in brackets and delimited by [/].The actual annotation process can be completed using the interface of the web-tool, so rest assured that there won’t be too much manual text editing. You can continue this process until your samples have been tagged with intents and concepts so that Mix can make sense of what people say to your app.
How do I train my model?
Now that you have some sample sentences that correspond with your concepts and intents, you can train your NLU model. The beauty of NLU is that even with a few sample sentences, a trained model can understand new sentences and ways of asking for things that aren’t explicitly accounted for, which is extremely valuable given that there are so many different ways of asking for the same things.
Now you’re ready to go build it!
So there you have it, our quick and dirty overview of how to use Mix.nlu to create an app that can understand speech and take action on it. Now that you know the basics, start tinkering and create something cool!
To see another example, you might also want to check out our short video tutorial on YouTube.