Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Delightful-TTS model #2095

Merged
merged 67 commits into from
Jul 24, 2023
Merged

Add Delightful-TTS model #2095

merged 67 commits into from
Jul 24, 2023

Conversation

loganhart02
Copy link
Contributor

model implementation from: https://arxiv.org/pdf/2110.12612.pdf

Copy link
Member

@erogol erogol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must test the model like we do vits.py at the very least and testing individual layers would be even better.

TTS/tts/layers/delightful_tts/conformer.py Outdated Show resolved Hide resolved
TTS/tts/layers/delightful_tts/conformer.py Show resolved Hide resolved
TTS/tts/layers/delightful_tts/conformer.py Show resolved Hide resolved
encoding: torch.Tensor,
) -> torch.Tensor:
"""
x --- [N, seq_len, encoder_embedding_dim]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These shape def docstrings need also reformatting as the other models to be compatible with our documentation.

TTS/tts/layers/delightful_tts/acoustic_model.py Outdated Show resolved Hide resolved
@loganhart02
Copy link
Contributor Author

@erogol The most recent push of code I know works and is currently training a model. after I confirm it converges Ill clean up the code and write the docs for the model

@loganhart02 loganhart02 marked this pull request as ready for review November 30, 2022 13:05
@loganhart02
Copy link
Contributor Author

@erogol I'm working on fixing a bug in unittest but the code to the model is ready to start the review


@dataclass
class DelightfulTTSConfig(BaseTTSConfig):

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can consider typing docstrings for the config arguments. I'd help you understand architecture better.

TTS/tts/datasets/dataset.py Outdated Show resolved Hide resolved
TTS/tts/layers/delightful_tts/acoustic_model.py Outdated Show resolved Hide resolved
encoder_outputs_res = encoder_outputs

# Pitch predictor
pitch_pred, avg_pitch_target, pitch_emb = self.pitch_adaptor.get_pitch_embedding_train(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we normalize the ground truth pitch somewhere?

TTS/tts/layers/delightful_tts/variance_predictor.py Outdated Show resolved Hide resolved
TTS/tts/models/delightful_tts.py Outdated Show resolved Hide resolved
TTS/tts/models/delightful_tts.py Outdated Show resolved Hide resolved
TTS/tts/utils/emotions.py Outdated Show resolved Hide resolved
@@ -0,0 +1,89 @@
import torch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to do the gradient pass test as we discuss before.

@stale
Copy link

stale bot commented Jan 20, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

@stale stale bot added the wontfix This will not be worked on but feel free to help. label Jan 20, 2023
@stale stale bot closed this Jan 27, 2023
@loganhart02 loganhart02 reopened this Jan 27, 2023
@stale stale bot removed the wontfix This will not be worked on but feel free to help. label Jan 27, 2023
@iamkhalidbashir
Copy link
Contributor

iamkhalidbashir commented Feb 23, 2023

Any idea when this will be merged? And will it have a pre-trained model?

@iamkhalidbashir
Copy link
Contributor

is this PR for Delightful TTS 1 or Delightful TTS 2 (https://arxiv.org/abs/2207.04646)

@stale
Copy link

stale bot commented May 12, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

@stale stale bot added the wontfix This will not be worked on but feel free to help. label May 12, 2023
@erogol
Copy link
Member

erogol commented May 14, 2023

@loganhart420 lets wrap up this PR

@stale stale bot removed the wontfix This will not be worked on but feel free to help. label May 14, 2023
@iamkhalidbashir
Copy link
Contributor

iamkhalidbashir commented May 14, 2023 via email

@loganhart02
Copy link
Contributor Author

@loganhart420 lets wrap up this PR

doing it now, should I just put the pertained weights in a draft release?

@loganhart02
Copy link
Contributor Author

Would we have a trained model ?
On Sun, 14 May 2023 at 3:38 PM Eren Gölge @.> wrote: @loganhart420 https://github.com/loganhart420 lets wrap up this PR — Reply to this email directly, view it on GitHub <#2095 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGS5WW24GACUO25ME47RDS3XGCY35ANCNFSM6AAAAAARNXTJQE . You are receiving this because you are subscribed to this thread.Message ID: @.>
-- Mr. Bashir, CEO, AMOXT Pvt. Ltd

yea

@loganhart02
Copy link
Contributor Author

is this PR for Delightful TTS 1 or Delightful TTS 2 (https://arxiv.org/abs/2207.04646)

1

@iamkhalidbashir
Copy link
Contributor

iamkhalidbashir commented May 14, 2023 via email

@erogol erogol merged commit 6fdb88f into dev Jul 24, 2023
38 of 44 checks passed
@erogol erogol deleted the delightful-tts branch July 24, 2023 11:41
Tindell pushed a commit to pugtech-co/TTS that referenced this pull request Sep 4, 2023
* add configs

* Update config file

* Add model configs

* Add model layers

* Add layer files

* Add layer modules

* change config names

* Add emotion manager

* fIX missing ap bug

* Fix missing ap bug

* Add base TTS e2e class

* Fix wrong variable name in load_tts_samples

* Add training script

* Remove range predictor and gaussian upsampling

* Add helper function

* Add vctk recipe

* Add conformer docs

* Fix linting in conformer.py

* Add Docs

* remove duplicate import

* refactor args

* Fix bugs

* Removew emotion embedding

* remove unused arg

* Remove emotion embedding arg

* Remove emotion embedding arg

* fix style issues

* Fix bugs

* Fix bugs

* Add unittests

* make style

* fix formatter bug

* fix test

* Add pyworld compute pitch func

* Update requirments.txt

* Fix dataset Bug

* Chnge layer norm to instance norm

* Add missing import

* Remove emotions.py

* remove ssim loss

* Add init layers func to aligner

* refactor model layers

* remove audio_config arg

* Rename loss func

* Rename to delightful-tts

* Rename loss func

* Remove unused modules

* refactor imports

* replace audio config with audio processor

* Add change sample rate option

* remove broken resample func

* update recipe

* fix style, add config docs

* fix tests and multispeaker embd dim

* remove pyworld

* Make style and fix inference

* Split tts tests

* Fixup

* Fixup

* Fixup

* Add argument names

* Set "random" speaker in the model Tortoise/Bark

* Use a diff f0_cache path for delightfull tts

* Fix delightful speaker handling

* Fix lint

* Make style

---------

Co-authored-by: loganhart420 <[email protected]>
Co-authored-by: Eren Gölge <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants