Pre specified symbol embeddings #254

JRMeyer · 2021-03-07T08:51:13Z

JRMeyer
Mar 7, 2021
Maintainer

>>> maneeshkyadav
[August 11, 2020, 1:30am]

I've adjusted the code a bit to be able to specify multicharacter, space
delimited phonemes that map to pre-specified dvectors, in the hope that
I might be able to better incorporate external information about
phonemes (mouth positions etc.) and better performance (without success
so far). Does anyone have utility for this? If so, I think there might
need to be some reorganization of the way input ultimately gets to the
network to accommodate a 'straight in' path that avoids cleaning and the
potential for phoneme generation etc.

I use a pickle file that holds the symbol embeddings makes it all that
is needed in 'characters' in the config JSON and it replaces the
nn.Embedding network in a model with nn.Linear and the appropriate dims.

It seems to work ok but bypassing the other config checks etc.
appropriately makes things a little inelegant. I haven't yet designed
the right way to do this correctly' but happy to do so if anyone else
wants the capability.

[This is an archived TTS discussion thread from discourse.mozilla.org/t/pre-specified-symbol-embeddings]

JRMeyer · 2021-03-07T08:51:16Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> nmstoker
[August 11, 2020, 12:27pm]

It would be useful to be able to insert additional steps around this
whole area (both in training and in inference). In the past I'd manually
make a few code edits to allow me to intercept the phoneme inputs and
substitute alternatives when the original text was a heteronym. I
haven't got the code to hand but could point out where it is applied
this evening after work, although I expect you're making changes in the
same locations.

I also experimented with the server (for inference) so that it would
check the input text for the presence of phonemes and if found it would
skip sending it to be turned into phonemes. This meant you could
normally use regular text but also make it capable of handling edge case
pronunciation (although my version only worked on a whole sentence level
for simplicity). It's quite useful for testing.

Making the processing around those steps a more configurable pipeline
would be ideal, but it comes at the cost of extra complexity.

Any strong views either way on this?

[Archived Post]

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre specified symbol embeddings #254

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Pre specified symbol embeddings #254

JRMeyer Mar 7, 2021 Maintainer

Replies: 1 comment

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer

JRMeyer
Mar 7, 2021
Maintainer Author