close
close

topicnews · September 22, 2024

Mozilla releases Common Voice Corpus 19.0

Mozilla releases Common Voice Corpus 19.0

With Common Voice, Mozilla provides the world’s public dataset of human voices – free and available to everyone. Mozilla has released version 19.0 of its dataset.

The speech recognition market is dominated by the big names of commercial providers: Amazon, Apple, Google, Microsoft. That’s why Mozilla launched the Common Voice project in 2017. With Common Voice, Mozilla offers a free alternative that anyone can contribute to and that is available to everyone. Mozilla wants to use this to promote innovation and competition in speech technology based on machine learning.

With the now released Common Voice Corpus 19.0, the German-language dataset has grown from 1,431 to 1,436 hours. Anyone who already has the Common Voice Corpus 18.0 can, as always, download just a so-called delta segment with the differences from the previous version. For German, this would reduce the download from 33.6 GB to 108 MB.

In total, the new version of Mozilla Common Voice now covers 131 languages ​​with a total of 32,584 recorded hours, making Mozilla Common Voice the most diverse multilingual language corpus in the world.

To download the Mozilla Common Voice datasets