Summit Teaser — Boosting the Vocabulary of a Speech Recognizer
In the AT&T Developer Summit track session How to Train a Speech API you’ll hear from David Thomson, Principal Architect, AT&T Research Labs about how to build voice-powered services with more flexibility using AT&T’s new user configurable speech recognizer.
The AT&T Speech API is a service developers can use to add automatic speech recognition to mobile apps and other software so that users can communicate with the software using their voice. The Speech API accepts an audio file or stream, transcribes it using the AT&T WATSON(TM) speech recognizer, and returns a transcription of the spoken words to the app.
Along with the audio file, the app must send a parameter indicating which context the speech recognizer should use. The context indicator tells the recognizer what type of speech to listen for (voicemail, SMS messages, etc.). If the recognizer knows the context, it is able to more accurately transcribe the audio into text. If the context is unknown or is not currently supported, the developer can use a generic context.
An interesting case arises when the developer wants to use a generic context, but wants to include a number of additional words that may not be known to the speech recognizer. These extra words may be product names, contact names, locations, company lingo or acronyms, or other words not in general use. For example, a grocery store may create an app that tells shoppers the aisle number for items they wish to purchase. Shoppers speak the product name (potatoes, bread, soup, etc.) and the generic recognizer transcribes it. The recognizer might not, however, know about new or obscure product names such as “sparksall” or “kroppkakkor.”
A simple solution for the grocery locator is to provide hints to the recognizer such as a list of words to be added to the generic vocabulary. These hints are provided as part of the API request and are posted along with the audio. The new words may be built into a standard grammar format, SRGS, and placed in parallel with the generic grammar. If the grammar is groceryitems.srgs, for example, the hints may be provided with a curl argument such as:
Hints are placed in parallel in the above example, so that added words may be spoken instead of generic terms, by setting the content disposition name as “x-grammar-altgram.” Hints may be placed in a series, so the grammar words are spoken before the general vocabulary, by using “x-grammar-prefix.” An example of a series use case is a virtual assistant that understands the command, “Send a text message to David Thomson. I’ll meet you at Taco Bell for lunch.” The developer-defined grammar supports the first portion, “Send a text message to David Thomson” and the generic context transcribes the message.
Hints may be placed in series (above) or in parallel (below).
Using generic terms with hints combines the convenience of a generic speech recognizer context with the flexibility of custom grammar. Developers can now quickly build applications that respond to voice commands and can tune the performance to support custom vocabularies.
If you’re attending the AT&T Developer Summit join David’s session January 6, 10:45 AM – 11:30 AM to learn more or visit the custom ASR page form more info.
Photo credit: msailing