I just tried a few and nothing in the open space seems complete with an easy checkpoint setup freely available and good documentation. Do they all require proprietary weights or worse?

  • @j4k3OP
    link
    English
    0
    edit-2
    1 year ago

    Bark, SpeechT5, MMS, and looked at Elevenlabs and Silero - the last 2 because they are enable options in Oobabooga, the first 3 because they are on hugging face.

    • @BlaedM
      link
      English
      2
      edit-2
      1 year ago

      I have used all of the above. In my experience, Elevenlabs is the most natural sounding (and easy-to-use) with open-source alternatives (kind of) close behind it.

      Unfortunately, Elevenlabs code is proprietary, so there’s a bit of a compromise there (unless you want to use one of the open-source alternatives you mentioned). To your point though, they aren’t the most user friendly.

      TTS has definitely been a neglected field of interest for some of the new tech to accompany this wave of AI development, but I think it’s only a matter of time before new options emerge as startups and other projects take flight this year and next. It will be a crucial area to nail for immersive video game dialogue, I’m sure someone will come up with a new platform or approach. Fingers crossed they make it open-source.

      For now, my suggestion is sticking to whatever TTS workflow works best with your current tech stack until something new comes out.

      If you end up finding something worth sharing, let us know! I’m very curious to see how audio and speech synthesis develops alongside all of this other fosai tech we’ve been seeing.

      • @j4k3OP
        link
        English
        11 year ago

        Well I tried tortoise TTS today and got a bit farther than others but it still doesn’t work for me. I almost have it working, but figuring out the API and playing the audio from a conda container inside a distrobox container just to shield my system from the outdated stuff used in the project may prove to be too much for my skills. The documentation for offline execution is crap.

        I’m actually getting farther into these configurations by keeping a Wizard LM 30B GGLM running in instruct mode the whole time and asking it questions. It is quite capable of taking in most output errors from a terminal and giving almost useful advice in many cases. That 30B model in GGML setup with 10 CPU threads and 20 layers on a 3080Ti-16GB is very close to the speed of a Llama2 7B running on just the GPU. It only crashes if I feed it something larger than what might fit on a single page of a PDF. My machine has 32GB of system memory. I think I need to get the max 64GB. As far as I have seen, a 7B model lies half the time, a 13B lies 20% of the time and my 30B lies around 10% at 4 bit. With a ton of extra RAM I want to see how much better a 30B is at 8 bit, or if a 70B is feasible and maybe closes the gap.

        • @BlaedM
          link
          English
          21 year ago

          Really appreciate the info and insights. Helps me adjust and test my benchmarks a ton. It’s remarkable what we’re able to do with consumer hardware now. It’s exciting to imagine where we’ll be at even a year from now!

          Let us know if you find a better setup and workflow in the future. Sounds pretty effective though. Curious to see how it powers up for you throughout the rest of the year.

          Thanks again. All this info is very helpful for others looking to get something similar running.