It’s frustrating when you’re not understood — especially when you’re trying to speak to Siri, Alexa, or another internet-connected device.

Voice datasets that power voice recognition services are owned by a handful of major companies, and they can wildly underrepresent the voices of non-dominant accents, Black, Indigenous, and other people of color, disabled people and gender marginalised people. In fact, for people speaking other global languages - there may be no datasets at all.

That’s why Mozilla launched Common Voice — the world’s largest public voice database, powered by the voices of volunteer contributors. Our goal is to teach machines how real people speak.

Today, we’re asking you to contribute to Common Voice, but we want you to choose how you’ll do it. Will you donate your voice to one of our Common Voice language datasets? Or will you make a $34 donation to Mozilla to support projects like this to reclaim the internet? (Or both!)

I’d be curious about the privacy concerns, but this might help a lot with underrepresented voice data. It might come down to if someone wants more datasets for their particular voice/language more than the other concerns.

If your language/accent is already well documented, it might not help as much?

  • @gedaliyah
    link
    English
    61 year ago

    But when I donate my voice, it’s not going to some vault at Mozilla. It becomes part of an open resource that anyone can use to build models, libraries, etc.

    Just because it is organized by a company that may or may not have nefarious goals, isn’t that still a good thing to exist?

    • @Deckweiss
      link
      English
      01 year ago

      Let me completely exaggerate to illustrate the concept:

      If osama bin laden or hitler, mao, a terrorist org etc. start a charity to plant more trees you would feel uncomfortable planting trees for their charity.

      If I don’t fully trust a company, it discourages me from participating in anything they do, no matter the intention.