An advanced AI-driven transcription was recently released by Microsoft, after launching a prototype of Project Denmark, the first AI transcription tool to be able to distinguish between multiple speakers. Multiple speakers or ‘interlocked speech’ is often cited to be one of the biggest drawbacks facing AI transcription services, as they are unable to tell the difference between different voices due to pre-programmed vocabularies, and non-native speaking accents.
Project Denmark is an AI-driven solution and part of Microsoft’s Azure cognitive speech-to-text service. Upon completion, Project Denmark will be able to separate between interlocked speech in real-time thanks to a Speech DDK - a type of microphone used in a number of smartphones and tablets, which can pick up audio from a group of people.
Project Denmark will certainly address one of the biggest problems that AI transcription faces in terms of interlocked speech. However, there have been three major issues with Microsoft’s prototype.
Microsoft has not yet revealed the launch date for Project Denmark, meaning it could be a matter of years until it enters the market. Furthermore, there has been no announcement regarding whether it will be able to understand non-native accents and slang. There is also the question of whether Project Denmark will be able to close the narrow gap in accuracy between the human ear and AI.
Humans currently hold a slim advantage over AI solutions thanks having an accuracy rate of between 99% to 100%. Meanwhile, a robot’s rate of correctness stands at 94.9%. This is because software engineers pre-program robots with a limited vocabulary, which is not conditioned to pick up on certain cultural contexts such as slang, jokes, and regional accents.
Regional accents or dialects are another big issue facing AI transcription services. A study from 2018 found that robots are just 59% accurate in terms of comprehension non-native speakers of their pre-programmed target languages. Other ASR (automated speech recognition) services including Amazon Alexa and Google Assistant have shown similar problems with comprehension, and the accuracy of both systems dropped by 2.6% with English speakers with a Chinese accent and by as much as 4.2% for Spanish accents.
Peter Trebek, the CEO of GoTranscript, believes that the human ear’s ability to distinguish between different accents and colloquial language is what puts it at an advantage over AI.
“Over the years, GoTranscript has shown itself to be a popular solution for broadcasting and entertainment providers such as the BBC and Netflix,” he said. “This is due to our transcribers’ ability to distinguish between multiple speakers and different accents, as well as accuracy rates of up-to 99%.”
There is no official launch date for Project Denmark, but it is set to move AI closer to parity with the human ear thanks to its ability to understand interlocked speech. However, it remains to be seen if Microsoft will be able to significantly successfully address the issues of accents and colloquialisms.