Good free TTS (text-to-speech) options?

@[email protected] · 1 year ago

Good free TTS (text-to-speech) options?

@[email protected] · edit-2 1 year ago

Festival – not cutting edge – will definitely be better than your Amiga, and can handle long text. Last time I set it up, IIRC I wanted some voices generated by Tokyo University or something, which took some setting up. It’ll probably be packaged in your Linux distro.

You can listen to a demo here.

https://www.cstr.ed.ac.uk/projects/festival/onlinedemo.html

It’s not LLM-based.

For short snippets, offline, one can use Tortoise TTS – which is LLM based. But it’s slow and can only generate clips of a limited length. Whether it’s reasonable for you will depend a lot on your application. It will let one clone – or make a voice sounding more-or-less similar – a voice using some sound samples from them speaking.

https://github.com/neonbjb/tortoise-tts

Examples at:

https://nonint.com/static/tortoise_v2_examples.html

I haven’t used Google’s, but I’d assume, given that Google is paying people to work on it full time, that whatever they’ve done probably sounds nicer. But, then not open source, so…shrugs

@[email protected] · 1 year ago

Ah, I looked at Tortoise, but I do not have an nVidia GPU, so I couldn’t try it. Festival I tried and the results were bad. Not so much for the voice, but for intonation and pronunciation.

@[email protected] · edit-2 1 year ago

Ah, I looked at Tortoise, but I do not have an nVidia GPU, so I couldn’t try it.

I use it on an AMD GPU.

EDIT: Wait, let me make sure. I was using an Nvidia GPU for a while and switched to AMD.

EDIT2: Oh, yeah, it uses transformers, and that doesn’t work on rocm presently, IIRC.

@[email protected] · 1 year ago

Have you tried Piper?

@[email protected] · 1 year ago

Yes, but if you compare it to https://cloud.google.com/text-to-speech?hl=en (scroll down a bit and you can try it) and the Neural2 model, it sounds like shit. I mean, it’s great to see that there are efforts, but it just pales in comparison.

@[email protected] · 1 year ago

Well, it’s about as good as you’re going to get right now.

observantTrapezium · 1 year ago

Piper is my choice. Very easy to use from the command line, fairly good sounding voices. Prior to that, for years (decades?) I used espeak-ng, had a very robotic voice but articulated almost everything very clearly, and I got used to it so didn’t actually mind.

@[email protected] · 1 year ago

Came here to recommend Piper. It’s an excellent TTS engine.

@[email protected] · 1 year ago

Espeak doesn’t get better, but nor does it get worse

@[email protected] · 1 year ago

Wow.

@[email protected] · edit-2 1 year ago

https://github.com/rsxdalv/tts-generation-webui and https://github.com/gitmylo/audio-webui. I use them all the time. Taking a sample of 10s i get amazing results.

@[email protected] · 1 year ago

Cool, I’ll give those a try!

@NarrativeBear · 1 year ago

Balabolka was/is my go to for TTS. It creates audio files as well for later if you need. Used it to make plenty of audio books in the past.

@filister · 1 year ago

I would say Elevenlabs is the best but unfortunately not free.

If you need it for a short while it might be worth it.

I tried Piper with different models, and a couple of FOSS alternatives but the output quality was definitely subpar.

I would say soon we will have good FOSS models, but for the time being that’s not the case.