Staring deeply into a person’s eyes may reveal the faintest glimpses of what’s written on their soul, but most of us would be content just to be able to read their lips. And that’s especially true for the hearing impaired.

Siri may be great at finding a great Thai restaurant, but we’re a long way from creating an artificial intelligence that’s a lip reader on par with HAL 9000 in "2001: A Space Odyssey." A slice of new research presented this week at the International Conference on Acoustics, Speech, and Signal Processing, however, may be a significant nudge toward that goal.

The research team, led by Dr. Helen L. Bear and Professor Richard Harvey of the University of East Anglia’s School of Computing Sciences, crafted a new variant of visual speech recognition technology, one that spliced together two existing methods for teaching a computer to decipher spoken words.

“We are still learning the science of visual speech and what it is people need to know to create a foolproof recognition model for lip reading, but this classification system improves upon previous lip-reading methods by using a novel training method,” Bear said in a statement.

Traditionally, the basic unit of spoken language is the phoneme, the smallest bit of sound that allows a word to be different from another, such as vowels and consonants. English has around 40. The field of visual speech recognition, while incorporating phonemes, mostly deals in something called the viviseme, the visual snapshot of our entire face, lips included, as we utter a phoneme.

The crux of the problem is that there are fewer visemes than there are phonemes, since a spoken sound can look the same for differently spelled words — the grammatically inclined might know those as homophones. In this case, visemes create visual-homophones. For that reason, machines that strictly rely on visemes to read lips tend to fare noticeably worse than those trained by listening to the utterings of real people. Machines trained via spoken phonemes still don’t pass the muster for truly functional lip reading and take much more time and effort to train, so the researchers opted for a third way forward.

First, they trained their classifiers using data from the visemes of 12 volunteers who spoke 200 sentences each. Then they zeroed in on the separate variations of a viseme, in hopes that would boost the machine’s ability to distinguish the minute visual contours that make similar words different from one another.

Sure enough, their classifiers performed noticeably better than earlier viseme-only trial runs, particularly when it came to correctly deciphering whole words. The program even improved when it came to speakers who were previously deemed difficult to read, though it still had trouble with people who overly “co-articulate” — more commonly known as mumblers.

"Lip reading is one of the most challenging problems in artificial intelligence so it's great to make progress on one of the trickier aspects, which is how to train machines to recognise the appearance and shape of human lips," Bear said.

Bear added that the advent of reliable lip-reading technology could someday open up a new world for people and professions of all stripes.

"Potentially, a robust lip-reading system could be applied in a number of situations, from criminal investigations to entertainment,” she said. "Crucially, whilst there are still improvements to be made, such a system could be adapted for use for a range of purposes — for example, for people with hearing or speech impairments. Alternatively, a good lip-reading machine could be part of an audio-visual recognition system."

Bear told Medical Daily that she hopes her team’s machines can further improve by studying data from more speakers or eventually incorporating research from other fields of artificial intelligence, particularly that of deep neural networks. The latter would allow the machines to engage in a process known as deep learning, which many computers already use to detect speech and translate it to text, among many other applications.

“It might seem corny, but i think it's really, really cool what we're doing here," Bear said. "At some point there will be functional lip reading technology and we’re contributing to that goal."

Encouraging as the paper is, though, it’s no game changer just yet, according to Dr. Catherine Palmer, director of audiology and hearing aids at the University of Pittsburgh Medical Center.

“Most individuals lip read to a certain extent and many studies have been done trying to improve our ability to train individuals to lip read better with only minor success for the very reasons stated in the paper — many sounds look the same,” Palmer told Medical Daily. “We are still not at a place where lip reading will produce a lot of information because of talker variability.”

Source: Bear H, Harvey R. Decoding visemes: Improving machine lip-reading. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing 2016. 2016.