I have several tapes (yes actual cassette tapes) of my grandfather reading a novel.

Unfortunately a few of the tapes have degraded to the point that I cannot play them back.

I would love to recreate his voice, to “rerecord” the missing bits.

The recordings are in Danish.

Is this possible?

If it is, how can I go about it?

  • @Grimy
    link
    4
    edit-2
    23 days ago

    Elvenlabs is currently the best but you can get some very good results with first xtts then rvc as a second pass. It involves fine tuning models and running things with python and notebooks, so requires some know how.

    You can explore more models on the huggingface page https://huggingface.co/models?pipeline_tag=text-to-speech&sort=trending

    Most have a huggingface space dedicated to them where you can try them, here is the xtts space for example https://huggingface.co/spaces/coqui/xtts

    The language adds an other layer of difficulty, I would try their demo first to see if it gives anything workable but it isn’t a language current tts software cater too, it doesn’t seem to be an available option on xtts sadly.

    • @boojumliussnarkOP
      link
      422 days ago

      Thank you for the tips. As I see it currently, I expect the language to be the biggest hurdle. It doesn’t appear like something I can add myself, even if I had the data for a model. So as far as I can tell it involves two currently more or less impossible steps: Get model data and teach language to model.

      • @Grimy
        link
        2
        edit-2
        22 days ago

        If you have material with him speaking in English, you might be able to train an xtts model on it and then use that to bypass the elvenlabs captcha but I’m not sure if they give enough time. Although GPU rental is cheap these days, so captcha time is less of a factor.

        If anything, the tech is moving quite fast, it will definitely be easier in a few years, maybe even months.