Recently, I called into AT&T to order a MicroCell for my home to help improve my cellular reception. Unfortunately for me, the reception outside my house is just fine, but inside it is quite poor. Fortunately, I was able to talk AT&T into providing me with a free one – probably because we have 8 devices between all of us, and AT&T enjoys our monthly bill. When I first called them and the phone initially answered, I was greeted by a, fairly human sounding, automated voice prompt that informed me I could speak in complete sentences to navigate their menus. So, being the computer scientist with a background in natural language processing, I decided to play around with it and see how it could do.
ME: “Yes, the reception in my house is bad. Can you help me fix it?”
AT&T AUTOMATED VOICE PROMPT: <Typing sounds> “It sounds like you need help with a MicroCell. I’ll pass along your information to a technician that can help you.”
ME: <Quite surprised> “Alright!”
We’ve all had a chance to see how far along “Ok, Google” and Siri have come in helping us when using our mobile devices. They are both quite good at specific tasks, such as checking the weather, finding sports scores, navigating around the web, and setting up reminders. But when you ask more specific questions, like: “What was the Capital of Texas in 1838?” you tend to get sent to a basic web search as if you had merely typed the question into Google.
Of course, this technology is only going to get better and better, but just about everything we’ve experienced so far has lacked much context on who the speaker is and what is motivating them. As humans, we typically do much better at that. When someone I’ve never met comes up to me and asks the question: “How far is it to Austin?” I can understand that they are likely wanting to know how long it takes to travel there. Or, in a business context, if they ask: “How are we doing in Austin?” I can understand that they are trying to figure out sales numbers or operational issues.
The level way beyond all of that is when automated assistants can pass what is famously known as the “Turing Test”, first proposed by Alan Turing in 1950. The test essentially defines true computer intelligence as a machine’s ability to exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human. In his test, the machine would not be required to be capable of understanding or communicating via spoken word (in his test the conversation would be limited to a text-only exchange), but given the current advances in the voice automated assistant space, I see no reason why we couldn’t expect that the first machine to pass the Turing Test will be capable of doing it via voice.
This technology isn’t just for science fiction writers or big companies like AT&T, but should be an essential part of any technology strategy roadmap. Users of technologies in all organizations are going to get used to the pervasiveness of spoken requests and the ability to make them using normal phrases. The days of Google for searching are numbered, in my opinion, but that’s for another blog. Companies should embrace this as a huge opportunity to improve data queries and other computing tasks.