Graves attention #124

JRMeyer · 2021-03-07T08:17:52Z

JRMeyer
Mar 7, 2021
Maintainer

>>> geneing
[January 4, 2020, 7:34pm]

I looked into the
implementation of Graves attention from the Battenberg paper and I think
it's wrong in the dev branch. It is using softplus for the mean term (as
in V2 model from the paper) and exponential for the variance (as in V1
model). When I train with the current dev branch I get major attention
artifacts during inference - it basically doesn't work.

The implementation of V2b model from the paper that was in the dev
branch before Nov 8 is also incorrect. It is multiplying by the variance
instead of dividing in the exponential term (phi_t = g_t slash *
torch.exp(-0.5 slash curl-run-all.sh discourse.mozilla.org html-to-markdown.sh ordered-posts ordered-posts~ TTS.cdx tts.commands tts-emails.txt TTS.pages tts-telegram.txt TTS.warc.gz sig_t slash curl-run-all.sh discourse.mozilla.org html-to-markdown.sh ordered-posts ordered-posts~ TTS.cdx tts.commands tts-emails.txt TTS.pages tts-telegram.txt TTS.warc.gz (mu_t slash _ - j) slash curl-run-all.sh discourse.mozilla.org html-to-markdown.sh ordered-posts ordered-posts~ TTS.cdx tts.commands tts-emails.txt TTS.pages tts-telegram.txt TTS.warc.gz slash *2), when it should be phi_t =
g_t slash curl-run-all.sh discourse.mozilla.org html-to-markdown.sh ordered-posts ordered-posts~ TTS.cdx tts.commands tts-emails.txt TTS.pages tts-telegram.txt TTS.warc.gz torch.exp(-0.5 slash curl-run-all.sh discourse.mozilla.org html-to-markdown.sh ordered-posts ordered-posts~ TTS.cdx tts.commands tts-emails.txt TTS.pages tts-telegram.txt TTS.warc.gz (mu_t slash _ - j) slash curl-run-all.sh discourse.mozilla.org html-to-markdown.sh ordered-posts ordered-posts~ TTS.cdx tts.commands tts-emails.txt TTS.pages tts-telegram.txt TTS.warc.gz slash *2 / sig_t) ).

With the correct implementation of V2b model, the attention seems to
work really well. It coverges quickly. More importantly, it doesn't
stutter and repeat with long difficult sentences.

I saw somewhere that you weren't happy with GMM attention performance.
Were you using the correct implementation during your evaluation?

[This is an archived TTS discussion thread from discourse.mozilla.org/t/graves-attention]

JRMeyer · 2021-03-07T08:17:55Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> erogol
[January 4, 2020, 9:04pm]

Wrong is a strong statement but yes it is different. It did not work as
in the paper so I changed it a bit. Their version was aggregating too
much value at initial steps and was not diffusing it. So I changed it as
after doing some empirical checks. The last time, my version for it
worked quite good for ljspeech but might be different in your case.
Please send a PR and let me check.

[Archived Post]

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graves attention #124

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Graves attention #124

JRMeyer Mar 7, 2021 Maintainer

Replies: 1 comment

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer

JRMeyer
Mar 7, 2021
Maintainer Author