Speech is an essential part of human communication. It is the ability to express thoughts and feelings through the use of spoken language. The brain is responsible for the production of speech. This process involves the coordination of many different muscles and structures in the body, including the vocal cords, lips, tongue, and teeth. Brain activity plays a vital role in speech production. The motor cortex is responsible for controlling the muscles used in speaking. This area of the brain sends signals to the muscles that control the movement of the vocal cords, lips, tongue, and teeth. These signals tell the muscles when to contract and relax in order to produce sound. Brain–machine interfaces (BMIs) are devices that allow humans to control machines with their thoughts. BMIs can be used to control robotic arms or legs, prosthetic devices, or even computer cursors. In some cases, BMIs have been used to restore lost functions such as movement or speech. BMIs work by reading electrical activity in the brain. This activity is detected by sensors that are placed on the surface of the skull or surgically implanted into the brain itself. The sensors pick up tiny changes in voltage that occur when neurons fire. These changes are then converted into commands that can be understood by a machine. BMIs hold great promise for people with speech impairments caused by neurological conditions such as ALS ( amyotrophic lateral sclerosis) or stroke .
An AI Algorithm Decodes Speech From Brain Activity With Surprisingly High Accuracy
Prefer to listen?
If you prefer to listen to, instead of reading the text on this page, all you need to do is to put your device sound on, hit the play button on the left, sit back, relax and leave everything else to us.
An AI algorithm to decode speech from brain activity has resulted in remarkable precision. The study is however still a long way from being able to be deployed to assist individuals who cannot speak due to the exorbitant costs of the hardware required.
The precision with which an artificial intelligence can decode words and phrases from brain activity is surprisingly high, but nevertheless not yet sufficiently high to enable the development of hardware and software for mass market use. The AI deduces what a person has heard using only a few seconds of brain activity data. 73% of the time, the right response is among the top 10 alternatives, according to a preliminary study.
The AI’s performance is clearly beyond what many believed achievable at this level.
Researchers on arXiv.org claimed, on August 25 2022, that an AI engine developed by Facebook’s parent company, Meta, might someday be used to assist thousands of individuals worldwide who are unable to communicate through speech, typing, or gestures. This includes a large number of individuals who are barely awake, locked-in, or in “vegetative states” – a condition often referred to as unresponsive wakefulness syndrome.
The majority of present communication aids for such people require invasive brain surgery and ends up leaving installed electrodes as the brain-machine interface. This technology, for instance, was being championed by Elon Musk’s, Max Hodak’s and Paul Merolla’s Neuralink project, which was recently surpassed by some less ceremonious and less-well-known startups that have achieved noteworthy results before Neuralink. Synchron, Neurable, Bitbrain, NextMind, Kernel and Emotiv are all examples of such tech startups.
According to neurologists these new technologies provide plausible avenues to aid patients with communication difficulties without the use of intrusive procedures.
Neuroscientist Jean-Rémi King, a Meta AI researcher currently at the École Normale Supérieure in Paris and his colleagues have created a computer algorithm to recognise words and phrases from 56,000 hours of audio recordings in 53 different languages. The tool, also known as a language model, learnt to detect certain elements of language at both a fine-grained level, such as letters or syllables, and a coarser one, such as a word or a phrase.
The scientists deployed an AI equipped with this language model to databases from four universities, which contained the brain activity of 169 individuals. In these databases, participants listened to stories and lines from works such as The Old Man and the Sea and Alice’s Adventures in Wonderland while their brains were scanned using MagnEtoencephaloGraphy (MEG) or electroencephalography. These approaches detect the electrical or magnetic component of brain impulses.
Using just three seconds of brain activity data from each participant and a computational technique that helps account for physical variances across actual brains, the researchers attempted to decipher what the individuals had heard. The scientists told the AI to connect the voice sounds from the tale recordings with patterns of brain activity that corresponded to what individuals were hearing, as computed by the AI. It then produced projections based on more than a thousand potential outcomes for what the individual may have heard during the brief interval.
Using magnetoencephalography researchers discovered that the right response was among the AI’s top 10 estimates 73% of the time. With electroencephalography, this figure fell to 30% or less. While the 73% accuracy rate can be expected to improve significantly with bigger datasets, the real stumbling block for the evolution of this research into a mass product is the hardware required. Indeed, at the moment, MEG requires cumbersome and exorbitantly-priced equipment, putting it out of reach of the mass market. Bringing this technology to clinics and to the people who need it requires technological advancements that make the equipment significantly less costly and more user-friendly.
It is also crucial to point out that what “decoding” refers to in this study is not what we generally mean by decoding. Typically, the term refers to the process of decoding information straight from its source, in this example, speech from brain activity. However, this was only achievable because the AI was given a finite set of potential accurate responses from which to make predictions. With language, this will not suffice if we wish this technology to scale to practical application, as language, while not being infinite gives rise to huge numbers of permutations, which thus would require an even larger training dataset. Additionally, the AI-deciphered information from people who passively listened to audio is not immediately applicable to nonverbal patients, and this possibly holds even at the level of principles.
Thus, in order for this to become a useful communication tool, scientists will need to discover how to decipher what these patients wish to communicate, such as hunger, discomfort, or a simple “yes” or “no”, as well as to discover new ways of bringing down hardware equipment costs. This research involves decoding speech perception, not creation. The latter is the ultimate objective and even though we will eventually get there, we’re still some way off from achieving that.