Replies: 3 comments
-
>>> erogol |
Beta Was this translation helpful? Give feedback.
-
>>> fatih_kiralioglu |
Beta Was this translation helpful? Give feedback.
-
>>> erogol |
Beta Was this translation helpful? Give feedback.
-
>>> fatih_kiralioglu
[April 14, 2020, 7:21am]
Hi, slash
I'm using latest master branches of MozillaTTS and related WaveRNN
branch (https://github.com/erogol/WaveRNN)
I have trained a test model for WaveRNN vocoder and trying to use newly
trained model in MozillaTTS for Tacotron1. My WaveRNN configuration is
like this:
slash '{ slash
'model_name': 'wavernn_4241_finetune_gaussian', slash
'model_description': 'Gaussian model as in Clarinet',
'audio':{
'audio_processor': 'audio', // to use dictate different audio processors, if available.
// Audio processing parameters
'num_mels': 80, // size of the mel spec frame.
'num_freq': 1025, // number of stft frequency levels. Size of the linear spectogram frame.
'sample_rate': 16000, // wav sample-rate. If different than the original data, it is resampled.
'frame_length_ms': 50, // stft window length in ms.
'frame_shift_ms': 12.5, // stft window hop-lengh in ms.
'preemphasis': 0.98, // pre-emphasis to reduce spec noise and make it more structured. If 0.0, no -pre-emphasis.
'min_level_db': -100, // normalization range
'ref_level_db': 20, // reference level db, theoretically 20db is the sound of air.
'power': 1.5, // value to sharpen wav signals after GL algorithm.
'griffin_lim_iters': 60,// #griffin-lim iterations. 30-60 is a good range. Larger the value, slower the generation.
// Normalization parameters
'signal_norm': true, // normalize the spec values in range [0, 1]
'symmetric_norm': false, // move normalization to range [-1, 1]
'max_norm': 1, // scale normalization to range [-max_norm, max_norm] or [0, max_norm]
'clip_norm': true, // clip normalized values into the range.
'mel_fmin': 0.0, // minimum freq level for mel-spec. ~50 for male and ~95 for female voices. Tune for dataset!!
'mel_fmax': 8000.0, // maximum freq level for mel-spec. Tune for dataset!!
'do_trim_silence': false // KEEP ALWAYS FALSE
},
'distributed':{
'backend': 'nccl',
'url': 'tcp: slash / slash /localhost:54321'
},
'epochs': 10000,
'grad_clip': 1000,
'lr': 0.0001,
'warmup_steps': 100,
'batch_size': 32,
'checkpoint_step': 1000,
'print_step': 10,
'num_workers': 8,
'mel_len': 8,
'pad': 2,
'mulaw': true,
'use_aux_net': false,
'use_upsample_net': false,
'upsample_factors': [5, 5, 11],
'mode': 'mold', // model with gaussian (gaus), misture of logistic dist (mold). or raw bit output (# bits).'
When I try to use this wavernn model together with tacotron1 model, as
soon as I try a test synthesis over server, I got the following error:
'RuntimeError: size mismatch, m1: slash [1 x 1026 slash ], m2: slash [81 x 512 slash ] at
/opt/conda/conda-bld/pytorch_1573049310284/work/aten/src/THC/generic/THCTensorMathBlas.cu:290'
The error stems from the following line in WaveRNN/models/wavernn.py
line 246:
'x = self.I(x)'
Here, the size of 'x' is slash [1 x 1026 slash ] while the size of 'I' is slash [81 x
512 slash ]
the size of 'I' is initialized in line:
'self.I = nn.Linear(feat_dims + 1, rnn_dims)'
and the size is naturally slash [81 x 512 slash ]
I wonder if the latest master branch of mozillaTTS is compatible with
WaveRNN or if there is any other error I make.
Thanks in advance.
[This is an archived TTS discussion thread from discourse.mozilla.org/t/error-when-trying-to-use-custom-trained-wavernn-model]
Beta Was this translation helpful? Give feedback.
All reactions