YourTTS checkpoint: Dutch, French, German, Italian, Portuguese, Polish, Spanish, and English #2735

freds0 · 2023-07-02T22:05:37Z

In this pull request, I have added a new checkpoint for the YourTTS model, which was trained in multiple languages, including Dutch, French, German, Italian, Portuguese, Polish, Spanish, and English.
To provide more context, the paper is available at the following link: CML-TTS A Multilingual Dataset for Speech Synthesis in Low-Resource Languages. The model was trained using the CML-TTS dataset and the LibriTTS dataset in English. I would also like to inform you that samples generated using this checkpoint can be verified by accessing the following link: https://freds0.github.io/CML-TTS-Dataset/

CLAassistant · 2023-07-02T22:05:42Z

All committers have signed the CLA.

erogol · 2023-07-02T23:16:02Z

Hey @freds0 this is awesome. Would you mind if I move the model somewhere more convenient? It is not very reliable to keep it in gdrive.

freds0 · 2023-07-02T23:51:44Z

@erogol That sounds like a great idea! It would be great to send it to a more reliable drive. Thanks for the suggestion.

erogol · 2023-07-04T09:50:17Z

To use the training speakers, speakers,pth should have the speaker embeddings too. Or we can release it with only voice cloning.

freds0 · 2023-07-04T23:14:28Z

I can share, but I didn't find it in my backups. It is likely that I will need to generate again!

erogol · 2023-07-05T09:57:12Z

@freds0 your call, if it is too much work, we can release with voice cloning.

freds0 · 2023-07-11T23:37:52Z

Hi @erogol , all embeddings were extracted, and are available at the following link at google drive:
https://drive.google.com/drive/folders/1bS_9-7QFmGWeAd6wtqnnjSS_-wV8FBNP

Or onedrive:

https://ufmtbr-my.sharepoint.com/:f:/g/personal/fredoliveira_ufmt_br/EnrCG5tSIiBDqfPlTfPjAGsBqjZWNkjBOd7-MCoxdJaeyQ?e=DFewo8

Is this really what you need?

erogol · 2023-07-14T09:08:36Z

I'll give it a try next Monday. Thanks for sharing 👍

erogol · 2023-07-24T11:44:03Z

@freds0 those files are crazy big. So I'll go with only voice cloning.

itsjamie · 2023-07-25T17:22:15Z

I'm relatively new to this field, is there some documentation somewhere that documents how I would go about consuming these myself?

What I've tried is;

renaming the best_model.pth to model.pth.

I tried with the provided initial file extracted into the user directory where the other models are downloaded, and hit the following:

Traceback (most recent call last):
  File "/Users/jstackhouse/anaconda3/envs/tts/bin/tts", line 8, in <module>
    sys.exit(main())
  File "/Users/jstackhouse/TTS/TTS/bin/synthesize.py", line 385, in main
    synthesizer = Synthesizer(
  File "/Users/jstackhouse/TTS/TTS/utils/synthesizer.py", line 91, in __init__
    self._load_tts(tts_checkpoint, tts_config_path, use_cuda)
  File "/Users/jstackhouse/TTS/TTS/utils/synthesizer.py", line 185, in _load_tts
    self.tts_model = setup_tts_model(config=self.tts_config)
  File "/Users/jstackhouse/TTS/TTS/tts/models/__init__.py", line 13, in setup_model
    model = MyModel.init_from_config(config=config, samples=samples)
  File "/Users/jstackhouse/TTS/TTS/tts/models/vits.py", line 1797, in init_from_config
    speaker_manager = SpeakerManager.init_from_config(config, samples)
  File "/Users/jstackhouse/TTS/TTS/tts/utils/speakers.py", line 113, in init_from_config
    speaker_manager = SpeakerManager(
  File "/Users/jstackhouse/TTS/TTS/tts/utils/speakers.py", line 63, in __init__
    super().__init__(
  File "/Users/jstackhouse/TTS/TTS/tts/utils/managers.py", line 149, in __init__
    self.load_embeddings_from_list_of_files(embedding_file_path)
  File "/Users/jstackhouse/TTS/TTS/tts/utils/managers.py", line 227, in load_embeddings_from_list_of_files
    ids, clip_ids, embeddings, embeddings_by_names = self.read_embeddings_from_file(file_path)
  File "/Users/jstackhouse/TTS/TTS/tts/utils/managers.py", line 194, in read_embeddings_from_file
    speakers = sorted({x["name"] for x in embeddings.values()})
  File "/Users/jstackhouse/TTS/TTS/tts/utils/managers.py", line 194, in <setcomp>
    speakers = sorted({x["name"] for x in embeddings.values()})
TypeError: 'int' object is not subscriptable

I assume this is because of what @erogol initially said where the speakers.pth file doesn't contain the embeddings.

With the provided JSON files, how would I go about recreating a working file containing these embedding?

I tried removing the speakers.pth, and instead using a format of speakers.json like the original YourTTS model's folder, but with the model for Spanish.

But doing that I hit:

Traceback (most recent call last):
  File "/Users/jstackhouse/anaconda3/envs/tts/bin/tts", line 8, in <module>
    sys.exit(main())
  File "/Users/jstackhouse/TTS/TTS/bin/synthesize.py", line 385, in main
    synthesizer = Synthesizer(
  File "/Users/jstackhouse/TTS/TTS/utils/synthesizer.py", line 91, in __init__
    self._load_tts(tts_checkpoint, tts_config_path, use_cuda)
  File "/Users/jstackhouse/TTS/TTS/utils/synthesizer.py", line 190, in _load_tts
    self.tts_model.load_checkpoint(self.tts_config, tts_checkpoint, eval=True)
  File "/Users/jstackhouse/TTS/TTS/tts/models/vits.py", line 1721, in load_checkpoint
    self.load_state_dict(state["model"], strict=strict)
  File "/Users/jstackhouse/anaconda3/envs/tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2150, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Vits:
	size mismatch for emb_l.weight: copying a param with shape torch.Size([8, 4]) from checkpoint, the shape in current model is torch.Size([0, 4]).

I figure this might be because I haven't loaded the embeddings for every language?

What should I go read? Or what am I missing?

freds0 · 2023-08-03T01:38:57Z

@itsjamie there are two ways to run this model effectively. The first method involves using these speaker embeddings files. Alternatively, you can opt for the second method, which requires providing a reference audio that will be sent to the model. To get started, simply follow the step-by-step instructions provided in this link:

https://colab.research.google.com/drive/1nZuvfW-gjoKJgm_S5_f9ydi5W1xvesCK?usp=sharing

acul3 · 2023-08-04T08:50:44Z

hey @freds0 thanks for sharing this,cool stuff

can you share the tensorboard log for this model if possible,
or at least at how many steps the model train

i'm trying to reproduce training using new language using guidence from your paper dan the original yourtts

thank you

freds0 · 2023-08-04T13:34:44Z

@acul3 Unfortunately I didn't save the logs. But to fine-tune a new language, you should mainly look at the alignment chart. When you have something close to the image below, the training can be ended.

freds0 · 2023-08-14T17:34:16Z

@erogol I created a version of the embeddings file with just 10 samples of each speaker (250MB). All speakers from the CML-TTS dataset are included, and also all the speakers from LibriTTS. Here is the download link

erogol · 2023-08-14T19:04:10Z

@freds0 thanks I'll check. I try to finish my backlog before merging this PR.

Edresson · 2023-09-07T15:46:07Z

@erogol @freds0 I have added a training recipe for the YourTTS model trained on CML-TTS paper.

erogol · 2023-09-08T11:16:48Z

@Edresson you should make a separate PR. I can merge it before we merge this one. (I don't know when I can find time to merge this one. )

Edresson · 2023-09-09T14:49:39Z

@Edresson you should make a separate PR. I can merge it before we merge this one. (I don't know when I can find time to merge this one. )

I removed the Recipe from this PR and added it on #2934

stale · 2023-10-14T11:33:37Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

Update

Correction in training the Fastspeech/Fastspeech2/FastPitch/SpeedySpeech model using external speaker embedding.

Adding checkpoint model

f6eaa61

vincentfretin mentioned this pull request Jul 16, 2023

[coqui-tts] production deployment and aframe component c-frame/sponsorship#9

Open

Edresson requested a review from erogol September 7, 2023 15:46

Edresson mentioned this pull request Sep 9, 2023

Add CML-TTS dataset YourTTS training recipe #2934

Merged

Edresson force-pushed the dev branch from 3e66d80 to f6eaa61 Compare September 9, 2023 14:48

stale bot added the wontfix This will not be worked on but feel free to help. label Oct 14, 2023

stale bot closed this Oct 22, 2023

Edresson reopened this Oct 24, 2023

stale bot closed this Nov 1, 2023

Edresson reopened this Nov 7, 2023

stale bot removed the wontfix This will not be worked on but feel free to help. label Nov 7, 2023

freds0 added 4 commits November 29, 2023 17:24

Merge pull request #1 from coqui-ai/dev

77c2155

Update

Merge branch 'coqui-ai:dev' into dev

a26e51b

Fixing bug

bcd500f

Correction in training the Fastspeech/Fastspeech2/FastPitch/SpeedySpeech model using external speaker embedding.

Merge branch 'coqui-ai:dev' into dev

163f9a3

freds0 closed this Dec 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

YourTTS checkpoint: Dutch, French, German, Italian, Portuguese, Polish, Spanish, and English #2735

YourTTS checkpoint: Dutch, French, German, Italian, Portuguese, Polish, Spanish, and English #2735

freds0 commented Jul 2, 2023

CLAassistant commented Jul 2, 2023 •

edited

Loading

erogol commented Jul 2, 2023

freds0 commented Jul 2, 2023

erogol commented Jul 4, 2023

freds0 commented Jul 4, 2023

erogol commented Jul 5, 2023

freds0 commented Jul 11, 2023

erogol commented Jul 14, 2023

erogol commented Jul 24, 2023

itsjamie commented Jul 25, 2023

freds0 commented Aug 3, 2023

acul3 commented Aug 4, 2023

freds0 commented Aug 4, 2023 •

edited

Loading

freds0 commented Aug 14, 2023

erogol commented Aug 14, 2023

Edresson commented Sep 7, 2023 •

edited

Loading

erogol commented Sep 8, 2023

Edresson commented Sep 9, 2023

stale bot commented Oct 14, 2023

YourTTS checkpoint: Dutch, French, German, Italian, Portuguese, Polish, Spanish, and English #2735

YourTTS checkpoint: Dutch, French, German, Italian, Portuguese, Polish, Spanish, and English #2735

Conversation

freds0 commented Jul 2, 2023

CLAassistant commented Jul 2, 2023 • edited Loading

erogol commented Jul 2, 2023

freds0 commented Jul 2, 2023

erogol commented Jul 4, 2023

freds0 commented Jul 4, 2023

erogol commented Jul 5, 2023

freds0 commented Jul 11, 2023

erogol commented Jul 14, 2023

erogol commented Jul 24, 2023

itsjamie commented Jul 25, 2023

freds0 commented Aug 3, 2023

acul3 commented Aug 4, 2023

freds0 commented Aug 4, 2023 • edited Loading

freds0 commented Aug 14, 2023

erogol commented Aug 14, 2023

Edresson commented Sep 7, 2023 • edited Loading

erogol commented Sep 8, 2023

Edresson commented Sep 9, 2023

stale bot commented Oct 14, 2023

CLAassistant commented Jul 2, 2023 •

edited

Loading

freds0 commented Aug 4, 2023 •

edited

Loading

Edresson commented Sep 7, 2023 •

edited

Loading