Companies Are Investing in Voice, But They’re Ignoring the Real Problem

In 2019, an analysis of the rise of voice technology noted that the sector’s success hinged on its ability to perform in noisy environments. Two years later, voice technology has had incremental improvements, but it remains clumsy. For all its promise, voice still falls short of its lab-tested potential once deployed in the real world.

Why? The real world is dynamic and unpredictable. Making solutions that work in these kinds of environments is challenging. Instead of complex solutions that match the complexity of the real world, we have simplified solutions tested in simple ways. We need to design voice solutions for the real world.

There is massive investment happening in conversational artificial intelligence. Verint recently acquired Conversocial for $50 million, and 67 percent of businesses are expected to increase their conversational AI budgets this year. However, if companies continue to test their products with synthetic sound environment models, these investments will fail.

With businesses pouring significant investment into conversational AI and voice assistants, designing voice solutions for the real world is a fiscal imperative if these companies want to survive.

As companies increasingly look to conversational AI to front their customer-facing operations and 24-hour support becomes the norm, voice must rise to the occasion and meet the human standard.

It’s time to deploy voice that’s ready for the real world.

When we talk about humanizing voice technology, we want our voice assistants and voice interfaces to feel as close as possible to interacting with another human. If we expect a person not only to hear, but also understand what we mean and want in a given situation, our voice assistants should also hear and understand us with all our nuance and peccadillos and perform the given function. This is the human baseline.

When companies test their voice technology with synthetic sound environments, they develop enough environmental sound profiles to hopefully match users in their own real-world environment. The issue, however, is that real-world situations are dynamic and, although a profile at the start of an interaction might match, it inevitably changes or new variables enter the equation. The technology can’t perform in the myriad edge cases that occur in everyday life.