There are many services that offer automatic transcription, we have tested some of them to see how good they really are at Swedish.
There are many services that offer automatic transcription, we have tested some of them to see how good they really are at Swedish.
We have tested the following services:
Usually, models are evaluated on relatively short audio files, often just a few seconds long. But in reality, you often need to transcribe audio that is up to one or several hours long.
We have selected over 10 hours of audio material in interview format, where each interview lasts up to an hour. The interviews have different speakers and vary in topic. We then compare the results from the different services with transcriptions made by professional transcribers.
To measure how similar two texts are to each other, we use what is called Word Error Rate (WER). WER is a measure of how many words differ between two texts and is a common measure for evaluating transcription services.
It is clear that Klang.ai is the model that performs best with an error rate of 7.8%! It is almost twice as good as the next best alternative from Microsoft which had an error rate of 12.4%!
At Klang.ai, we have put a large part of our focus on Swedish specifically. This has made it possible for us to train the model on both more and better Swedish data than the other models. Swedish is, after all, a relatively small language, so for the large American companies, Swedish is not as prioritized.
This focus, together with our expertise in AI and machine learning, is what enables us to offer the world's best model for Swedish!
There is a lot of talk about OpenAI's Whisper model that has been released freely available to the public. The coolest thing about Whisper is that the model can transcribe in 100 different languages, most with relatively good quality. To be able to support so many languages, they have collected data from many different sources, but with varying quality.
The disadvantage of this is that OpenAI does not have the ability to review the results in each language and filter out poor transcriptions from the training data.
This leads to several different problems with OpenAI's Whisper:
A large part of the reason why OpenAI's Whisper is so much worse than the other models above is actually due to the errors above. If you exclude all hallucinations and missed sentences, Whisper becomes almost as good as Google's and Microsoft's best models.
We at Klang.ai are proud to have the best transcription for interviews in Swedish and it can be tested for free.