Replies: 1 comment
-
>>> erogol |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
>>> geneing
[January 4, 2020, 7:34pm]
I looked into the
implementation of Graves attention from the Battenberg paper and I think
it's wrong in the dev branch. It is using softplus for the mean term (as
in V2 model from the paper) and exponential for the variance (as in V1
model). When I train with the current dev branch I get major attention
artifacts during inference - it basically doesn't work.
The implementation of V2b model from the paper that was in the dev
branch before Nov 8 is also incorrect. It is multiplying by the variance
instead of dividing in the exponential term (phi_t = g_t slash *
torch.exp(-0.5 slash curl-run-all.sh discourse.mozilla.org html-to-markdown.sh ordered-posts ordered-posts~ TTS.cdx tts.commands tts-emails.txt TTS.pages tts-telegram.txt TTS.warc.gz sig_t slash curl-run-all.sh discourse.mozilla.org html-to-markdown.sh ordered-posts ordered-posts~ TTS.cdx tts.commands tts-emails.txt TTS.pages tts-telegram.txt TTS.warc.gz (mu_t slash _ - j) slash curl-run-all.sh discourse.mozilla.org html-to-markdown.sh ordered-posts ordered-posts~ TTS.cdx tts.commands tts-emails.txt TTS.pages tts-telegram.txt TTS.warc.gz slash *2), when it should be phi_t =
g_t slash curl-run-all.sh discourse.mozilla.org html-to-markdown.sh ordered-posts ordered-posts~ TTS.cdx tts.commands tts-emails.txt TTS.pages tts-telegram.txt TTS.warc.gz torch.exp(-0.5 slash curl-run-all.sh discourse.mozilla.org html-to-markdown.sh ordered-posts ordered-posts~ TTS.cdx tts.commands tts-emails.txt TTS.pages tts-telegram.txt TTS.warc.gz (mu_t slash _ - j) slash curl-run-all.sh discourse.mozilla.org html-to-markdown.sh ordered-posts ordered-posts~ TTS.cdx tts.commands tts-emails.txt TTS.pages tts-telegram.txt TTS.warc.gz slash *2 / sig_t) ).
With the correct implementation of V2b model, the attention seems to
work really well. It coverges quickly. More importantly, it doesn't
stutter and repeat with long difficult sentences.
I saw somewhere that you weren't happy with GMM attention performance.
Were you using the correct implementation during your evaluation?
[This is an archived TTS discussion thread from discourse.mozilla.org/t/graves-attention]
Beta Was this translation helpful? Give feedback.
All reactions