Error when trying to use custom trained wavernn model #182

JRMeyer · 2021-03-07T08:32:29Z

JRMeyer
Mar 7, 2021
Maintainer

>>> fatih_kiralioglu
[April 14, 2020, 7:21am]

Hi, slash
I'm using latest master branches of MozillaTTS and related WaveRNN
branch (https://github.com/erogol/WaveRNN)

I have trained a test model for WaveRNN vocoder and trying to use newly
trained model in MozillaTTS for Tacotron1. My WaveRNN configuration is
like this:

slash '{ slash
'model_name': 'wavernn_4241_finetune_gaussian', slash
'model_description': 'Gaussian model as in Clarinet',

'audio':{
'audio_processor': 'audio', // to use dictate different audio processors, if available.
// Audio processing parameters
'num_mels': 80, // size of the mel spec frame.
'num_freq': 1025, // number of stft frequency levels. Size of the linear spectogram frame.
'sample_rate': 16000, // wav sample-rate. If different than the original data, it is resampled.
'frame_length_ms': 50, // stft window length in ms.
'frame_shift_ms': 12.5, // stft window hop-lengh in ms.
'preemphasis': 0.98, // pre-emphasis to reduce spec noise and make it more structured. If 0.0, no -pre-emphasis.
'min_level_db': -100, // normalization range
'ref_level_db': 20, // reference level db, theoretically 20db is the sound of air.
'power': 1.5, // value to sharpen wav signals after GL algorithm.
'griffin_lim_iters': 60,// #griffin-lim iterations. 30-60 is a good range. Larger the value, slower the generation.
// Normalization parameters
'signal_norm': true, // normalize the spec values in range [0, 1]
'symmetric_norm': false, // move normalization to range [-1, 1]
'max_norm': 1, // scale normalization to range [-max_norm, max_norm] or [0, max_norm]
'clip_norm': true, // clip normalized values into the range.
'mel_fmin': 0.0, // minimum freq level for mel-spec. ~50 for male and ~95 for female voices. Tune for dataset!!
'mel_fmax': 8000.0, // maximum freq level for mel-spec. Tune for dataset!!
'do_trim_silence': false // KEEP ALWAYS FALSE
},

'distributed':{
'backend': 'nccl',
'url': 'tcp: slash / slash /localhost:54321'
},

'epochs': 10000,
'grad_clip': 1000,
'lr': 0.0001,
'warmup_steps': 100,
'batch_size': 32,
'checkpoint_step': 1000,
'print_step': 10,
'num_workers': 8,
'mel_len': 8,
'pad': 2,
'mulaw': true,
'use_aux_net': false,
'use_upsample_net': false,
'upsample_factors': [5, 5, 11],
'mode': 'mold', // model with gaussian (gaus), misture of logistic dist (mold). or raw bit output (# bits).'

When I try to use this wavernn model together with tacotron1 model, as
soon as I try a test synthesis over server, I got the following error:

'RuntimeError: size mismatch, m1: slash [1 x 1026 slash ], m2: slash [81 x 512 slash ] at
/opt/conda/conda-bld/pytorch_1573049310284/work/aten/src/THC/generic/THCTensorMathBlas.cu:290'

The error stems from the following line in WaveRNN/models/wavernn.py
line 246:

'x = self.I(x)'

Here, the size of 'x' is slash [1 x 1026 slash ] while the size of 'I' is slash [81 x
512 slash ]

the size of 'I' is initialized in line:

'self.I = nn.Linear(feat_dims + 1, rnn_dims)'

and the size is naturally slash [81 x 512 slash ]

I wonder if the latest master branch of mozillaTTS is compatible with
WaveRNN or if there is any other error I make.

Thanks in advance.

[This is an archived TTS discussion thread from discourse.mozilla.org/t/error-when-trying-to-use-custom-trained-wavernn-model]

JRMeyer · 2021-03-07T08:32:32Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> erogol
[April 15, 2020, 9:40am]

I've not run WaveRNN for a long while. So there might be a mismatch. But
it should be easy to fix. If you fix it, it slash 'd be a nice PR.

[Archived Post]

0 replies

JRMeyer · 2021-03-07T08:32:34Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> fatih_kiralioglu
[April 20, 2020, 3:49pm]

Hi Eren, slash
I think I have found the error: slash
in the file synthesizer.py at line 190: slash
wav = self.wavernn.generate(vocoder_input,
batched=self.config.is_wavernn_batched, target=11000, overlap=550)

the definition of vocoder_input is:

vocoder_input = torch.FloatTensor(postnet_output.T).unsqueeze(0)

here, postnet_output is 1025 dimensional while decoder_output is 80
dimensional which is an expected input to wavernn vocoder.

when I gave decoder_output instead of postnet_output to the vocoder, the
error disappeared but synthesized waveforms are unintelligible.

Could you correct me if I'm doing anything wrong?

Thanks

[Archived Post]

0 replies

JRMeyer · 2021-03-07T08:32:37Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> erogol
[April 21, 2020, 9:14pm]

I don't think I can look at warernn right now. Sorry about it but I am
out of my bandwidth right now.

[Archived Post]

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when trying to use custom trained wavernn model #182

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Error when trying to use custom trained wavernn model #182

JRMeyer Mar 7, 2021 Maintainer

Replies: 3 comments

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer

JRMeyer
Mar 7, 2021
Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer Author