Tutorial on voice cloning with Bark TTS with all the instructions and examples

@2dollarsim · edit-2 2 years ago

Tutorial on voice cloning with Bark TTS with all the instructions and examples

@2dollarsim · 2 years ago

Now all the components should be installed. Note that I did not need to install numpy or torch as described in the original post

You can test that it works by running

jupyter notebook

And you should get an interface like this pop up in your default browser:

@2dollarsim · edit-2 2 years ago

Now for a tricky part:

When I tried to run through voice cloning, I had this error:

--> 153 def auto_train(data_path, save_path='model.pth', load_model: str | None = None, save_epochs=1):

154 data_x, data_y = [], []

156 if load_model and os.path.isfile(load_model):

TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'

From file customtokenizer.py in directory bark-with-voice-clone/hubert/

To solve this, I just plugged this error into chatGPT and made some slight modifications to the code.

At the top I added the import for Union underneath the other imports:

from typing import Union

And at line 154 (153 before adding the import above), I modified it as instructed:

def auto_train(data_path, save_path='model.pth', load_model: Union[str, None] = None, save_epochs=1):

compare to original line:

def auto_train(data_path, save_path='model.pth', load_model: str | None = None, save_epochs=1):

And that solved the issue, we should be ready to go!

@2dollarsim · edit-2 2 years ago

You now need to get a <10 second wav file as an example to train from. Apparently as little as 4 seconds works too. I won’t cover that in this tutorial.

For mine, I cut some audio from a clip of a woman speaking with very little background noise. You can use https://www.lalal.ai/ to extract voice from background noise, but I didn’t need to do that in this case. I did when using a clip of Megabyte from Reboot talking, which worked… mostly well.

I created an input folder to put my training wav file in:

bark-with-voice-clone/input

Now we can go through this next section of the tutorial:

Run Jupyter Notebook while in the bark folder:

jupyter notebook

This will open a new browser tab wit the Jupyter interface. Click on clone_voice.ipynb

This is very similar to Google Collab where you run blocks of code. Click on the first block of code and click Run. If the code block has a “[*]” next to it, then it is still processing, just give it a minute to finish.

This will take a while and download a bunch of stuff.

If it manages to finish without errors, run blocks 2 and 3. In block 4, change the line to: filepath = “input/audio.wav”

Make sure you update this block with a valid filepath (to prevent a permissions related error remove the leading “/”) and audio name

outputs will be found in: bark\assets\prompts

@2dollarsim · edit-2 2 years ago

Now you can move your voice over to the right bark folder in ooba to play with. You can test the voice in the notebook if you just keep moving through the code blocks, I’m sure you’ll be able to figure that part out by yourself.

In order for me to be able to select the voice, I actually had to overwrite one of the existing english voices, because my voice didn’t appear in the list.

Overwrite (make backup of original if you want) en_speaker_0.npz in one of these folders:

oobabooga_linux_2/installer_files/env/lib/python3.1/site-packages/bark/assets/prompts/v2

oobabooga_linux_2/installer_files/env/lib/python3.1/site-packages/bark/assets/prompts/

And select the voice (or the v2/voice) from the list in bark in ooba.

Tutorial on voice cloning with Bark TTS with all the instructions and examples

Tutorial on voice cloning with Bark TTS with all the instructions and examples

Reddit - Dive into anything