Talking, conversing, exchanging words: for more than 10 million people, this seemingly simple act cannot be imagined without assistive technologies, such as voice generating devices, touch screens or text-to-speech apps. What does the digital future bring for them? How could innovations turn around the translation industry or the medical administration process? Here’s a glimpse into the future of voice and speaking.

Speaking, identity, voice stereotypes

Give me the key! – This simple sentence carries much more information when it’s pronounced. A weary Philippino mother could instruct her little child as she cannot open the door to their home otherwise. A friend could reassure a troubled twenty-something that everything’s going to be alright with her dog, she will take care of little Suzy while she’s away on holiday. An angry and jealous Dutch woman can demand the opener from her boyfriend who doesn’t want to unlatch the door as another woman hides behind it.

Voice carries much more information about the speaker than just content. Tonality, speed, volume, pitch or articulation of the vowels all matters – and make every single one of us unique, become part of our identities. And as other characteristics of human appearance, it creates expectations and stereotypes. A strong, muscle-man is anticipated to have a low, rotund voice while a small girl a thin and high-pitched one. That’s why jokes playing with voice stereotypes could work: as in the cute cartoon, Up, where the Alpha dog’s voice machine gets messed up and instead of his alpha-male, strong barking he’s giving instructions in a high-pitched bird cry.

In the light of the above, the question emerges: what happens to those who lose their ability to speak due to injury or illness? Could modern technology give them back the functionality of speaking alongside the lost part of their identity?

future of voice

Stephen Hawking and the birth of speech generating devices

The most famous cosmologist on Earth, the late Stephen Hawking was also someone who involuntarily popularized speech generating devices – and thus gave an eloquent response to the first question. The genius physicist suffered from amyotrophic lateral sclerosis (ALS), which results in the gradual deterioration of muscles and leads to difficulty swallowing, speaking and eventually breathing. Everyone has an image about him giving lectures and communicating with audiences through his computer.

Digital adaptive alternative communication (AAC) methods, as well as speech generating devices (SGDs) or voice output communication aids (VOCA), have been helping more than 2 million people in the US, while about 1 percent of the British population. People with ALS, cerebral palsy, Locked-in syndrome, multiple sclerosis, Parkinson’s disease, Brain stem stroke, traumatic brain injury or several developmental communication impairments or disabilities might be in need of such speech generating devices. The forerunners of these aids have been around since the early 1960s. Pioneers in the speech pathology field began implementing the use of communication boards with individuals who were cognitively or physically impaired at that time. For example, Maling and Clarkson developed POSSUM, the first piece of technology in 1963 exclusively designed as a communication tool for individuals with severe physical disabilities. POSSUM incorporated a typewriter with a switch-controlled scanning device.

future of voice

Robotic voices and the development of AAC methods

Since then, AAC devices have been advancing rapidly due to the development of speech synthesis algorithms and the improved power, speed, and memory of microprocessors. We had grown from a time when portable communication aids weighed nearly seventeen pounds to present day devices that can weigh as little as a pound. Also, the first AAC devices marketed in the 1990s had limited functionalities and could come with inherent flaws.

The Guardian reminds us, the 2014 biopic of Hawking’s life, The Theory of Everything, contains a stark reminder of the loss that this technology could bring with it. When Hawking and his first wife, Jane, first hear what will be Hawking’s new voice, they are stunned. After a moment of speechlessness, Jane offers a timid objection: “It’s American.” The moment is played for laughs, but it marks a trauma. From that time on, the British Hawking was talking to his audience through an American accent and a rather robotic voice. Moreover, in contrast with the unique human voices, this mechanic sound scale has also been the voice to choose both for little girls or old men worldwide – as the collection of voices wasn’t that wide.

In the last couple of years, the rapid advancement of technology started to change that profoundly. Alongside the exponential growth in the capabilities of the devices, their ability has grown to adapt better to existing identities – and that will even be better in the future. The innovative power and the positive response of users have also been shown in the market figures: in 2017, the Speech Generating Devices market was valued at 190 Million US$ and is projected to reach 330 Million US$ by 2025, at an 8.4% growth rate. Now, let’s see what devices could make communication better for those who are unable to speak or pronounce syllables with great difficulties.

future of voice

Alternative communication solutions for people having trouble speaking

1) Assistive apps

The appearance of smartphones, tablets, and digital touch screens, in general, allow a more straightforward way of communicating without speaking. The simplest AAC device is a picture board or touch screen that uses pictures or symbols of common items and activities that make up a person’s daily life. For example, autistic children or adults might have great difficulties in expressing their needs, but it’s possible to teach them to touch the image of bread if they want to ask for a slice of bread. Several of such picture boards can be customized and expanded based on a person’s age, education, occupation, and interests.

The iOS app, SayIt! is a simple help for people having trouble speaking: they can open the app, type something and tap the speak button. The program also has a word prediction element similar to autocomplete or autocorrect. The Predictable app offers something similar, using intelligent word prediction to provide word options on a keyboard to limit the amount of typing it takes to write a sentence or phrase to be spoken by the app. On the other end of the spectrum, the Proloquo2Go app offers symbol-based communication – used by over 200,000 people in need.

The MyTalkTools Mobile app was created by a dad whose son was born with Nager Syndrome – a rare condition affecting hearing, speech and other abilities. As the family was never satisfied with the existing solutions, they started to develop MyTalkTools on the iPhone. With over 100,000 users, MyTalkTools Mobile is definitely one of the most popular apps for a complete solution to overcome communication difficulties. One feature that’s really neat contains local content that is accessible to the user without needing Wi-Fi or data.

Regarding assistive communication apps, the future undoubtedly holds artificial intelligence – through the development of natural language processing and text prediction. Experts believe that personalized synthetic speech applications could provide users with opportunities to customize and/or personalize their voice in the future. An example is the platform WaveNet produced by Google’s DeepMind artificial intelligence algorithm, which utilizes the technology to model audio waveforms based on real human speech and learns from it to create its own sounds in a variety of voices. Or consider GazeSpeak, an open-source platform which uses artificial intelligence to convert eye movements into speech, and makes it easier for users with ALS and other disabilities to converse in real time.

2) Speech-generating devices

Speech-generating devices go one step further by translating words or pictures into speech. Some models allow users to choose from several different voices, such as male or female, child or adult, and even some regional accents, others enable the person to use their eyes or tilt their heads to activate switches and have personalized features for specific conditions and use cases. Some devices employ a vocabulary of prerecorded words, while others have an unlimited vocabulary, synthesizing speech as words are typed in.

For example, the Pocket Go Talk is a wearable, light-weight, small and portable AAC device, also built for tabletop use with five adjustable scanning speeds. MegaBee is a simple-to-use assisted writing tablet, which was developed in conjunction with patients and as a consequence, tailored perfectly to their needs. For a particular use case, Enabling Devices’ Tactile Symbol Communicator applies tactile symbols for concrete or abstract representations that can be tactually recognized by the visually impaired user.

In addition, several companies exist which aim to combine existing digital technology, including social companion robots, to further improve assisting speech devices. A spin-off of Luxemburg University, LuxAI has developed a social companion robot for children with autism, the QTrobot, which teaches the little ones new skills for being able to express their emotions and better participate in social interactions.

On the other hand, the Boston-based VocalID promises to bring life to robotic machine voices by the power of crowdsourcing and voice blending technology. The company creates a unique vocal persona for any text-to-speech tool. So far, over 14,000 speakers from over 110 countries have contributed over 6 million sentences to their growing spoken repository, The Human Voicebank.

3) The future: communication through brain-computer interfaces

The idea that once humanity will be able to give out commands purely by thinking about them is as far away in the terrain of science fiction as Marty’s hoverboard from Back to the Future (no, single-wheeled hoverboards don’t count).

However, some researchers undertook the issue in large-scale research projects. University of Reading researcher Dr. Kevin Warwick managed to control machines and communicate with others using only his thoughts with a cutting-edge neural implant. In 1998, Warwick, who earned the nickname “Captain Cyborg” from his colleagues, implanted a transmitter in his arm to control doors and other devices; then in 2002 he decided to implant electrodes directly into his nervous system in order to control a wheelchair with his thoughts and allow a remote robot arm to mimic the actions of his own arm.

Having a goal of helping voiceless patients communicate as a next, and very brave, step, Warwick implanted a chip into his wife’s arm to link their brains together through the Internet, creating the world’s first electronic brain–to–brain communication. When she moved her hand three times, he felt three pulses and recognized that his wife was communicating. He is optimistic that mind–to–mind communication will become a commercial reality in the next one or two decades. When Cathy Hutchinson, paralyzed years earlier by a brainstem stroke, managed to take a drink from a bottle by manipulating a robot arm with only her brain and a neural implant in 2012, the path became clear for future research.

future of voice

Speaking 2.0: Voice interface technology and real-time translations

Innovations could lift communication to another level in the future – not only with regards to the fact how it empowers people with speech impairments but also in connection with how people exchange information with individuals from other cultures or with machines.

In the next years, real-time translation and the dream about the notion that people can speak a common(ly understood) language will get within reach. In October 2107, Google launched a set of Bluetooth earbuds called the Pixel Buds with one standout feature: instant translation between 40 different languages using a Pixel smartphone. Meanwhile, Microsoft’s Skype Translator can do real-time translation between eight languages over a voice or video call or between 50 languages for text chat, making conference calling between nations and languages possible. The portable and wearable ili device helps travelers translate from English to Japanese, Mandarin or Spanish within seconds without any Internet connection. In the future, you might have a small earbud and switch from Chinese to Icelandic then back to English within 5 minutes of conversation without any problems.

Not even sign language is left out of technology’s sight. Researchers at Texas A&M University have developed a wearable device that “translates” sign language into English by sensing the user’s movements. In October 2015, the instrument was already able to recognize some 40 ASL signs with 96 percent accuracy. However, as The Atlantic points out, projects converting sign language, especially ASL gloves, have strong limitations at the moment. Perhaps, the future will bring some breakthrough in interpreting human movements for sign linguistic purposes by machines.

Nevertheless, machines will not only act as interpreters but also as communication partners. Gartner estimates that 30% of our interactions with technology will be through conversations with smart machines by the end of this year. Now, one in six U.S. adults owns a voice-activated intelligent speaker or device – and Forbes believes that number will continue to rise. In the future, speaking to Siri or Alexa will not only result in turning the lights on and off, but also in making restaurant reservations or dentist appointments. In a couple of years, you might not be sure whether you are talking to an A.I. or a real person.

future of voice

Medical implications: Is the doctor speaking?

Speech technologies will impact the business world, the private sphere – and the medical communityChatbots will become the first line in primary care. That means patients will turn to chatbots for medical advice in simple cases and they will be sent to a doctor only in cases when the algorithm decides that the individual needs further assistance. It will take some burden off the shoulders of busy doctors and nurses while taking care of patients.

Artificial intelligence-powered and speech-based solutions will support diagnostics in many other ways, too. As vocal biomarkers are just as unique as our voice itself, researchers believe that they could help in diagnosing certain illnesses. The tech giant, IBM is teaming its Watson AI supercomputer with academic researchers to try to predict from speech patterns whether patients are likely to develop a psychotic disorder. An Israeli company, Beyond Verbal deals with emotion analytics and provides voice analysis software. It has announced that its algorithms were successful in helping to detect the presence of coronary artery disease (CAD) in a group of patients.

In addition, voice assisting technologies could also become a valuable asset in medical administration. Imagine that artificial intelligence-based voice tools could record patient visits and make medical notes for doctors without the need to type them into any health record system. How much more time a medical professional could spend with their patients without the need for constant administration? A company called Augmedix utilizes Google Glass to enable physicians to examine their patients, while remote medical scribes fill out the electronic medical records based on what they hear and see from the visit. How mind-boggling solutions technology could bring to the medical practice, don’t you think?