You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've adjusted the code a bit to be able to specify multicharacter, space delimited phonemes that map to pre-specified dvectors, in the hope that I might be able to better incorporate external information about phonemes (mouth positions etc.) and better performance (without success so far). Does anyone have utility for this? If so, I think there might need to be some reorganization of the way input ultimately gets to the network to accommodate a 'straight in' path that avoids cleaning and the potential for phoneme generation etc.
I use a pickle file that holds the symbol embeddings makes it all that is needed in 'characters' in the config JSON and it replaces the nn.Embedding network in a model with nn.Linear and the appropriate dims.
It seems to work ok but bypassing the other config checks etc. appropriately makes things a little inelegant. I haven't yet designed the right way to do this correctly' but happy to do so if anyone else wants the capability.
[This is an archived TTS discussion thread from discourse.mozilla.org/t/pre-specified-symbol-embeddings]
It would be useful to be able to insert additional steps around this whole area (both in training and in inference). In the past I'd manually make a few code edits to allow me to intercept the phoneme inputs and substitute alternatives when the original text was a heteronym. I haven't got the code to hand but could point out where it is applied this evening after work, although I expect you're making changes in the same locations.
I also experimented with the server (for inference) so that it would check the input text for the presence of phonemes and if found it would skip sending it to be turned into phonemes. This meant you could normally use regular text but also make it capable of handling edge case pronunciation (although my version only worked on a whole sentence level for simplicity). It's quite useful for testing.
Making the processing around those steps a more configurable pipeline would be ideal, but it comes at the cost of extra complexity.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
>>> maneeshkyadav
[August 11, 2020, 1:30am]
I've adjusted the code a bit to be able to specify multicharacter, space
delimited phonemes that map to pre-specified dvectors, in the hope that
I might be able to better incorporate external information about
phonemes (mouth positions etc.) and better performance (without success
so far). Does anyone have utility for this? If so, I think there might
need to be some reorganization of the way input ultimately gets to the
network to accommodate a 'straight in' path that avoids cleaning and the
potential for phoneme generation etc.
I use a pickle file that holds the symbol embeddings makes it all that
is needed in 'characters' in the config JSON and it replaces the
nn.Embedding network in a model with nn.Linear and the appropriate dims.
It seems to work ok but bypassing the other config checks etc.
appropriately makes things a little inelegant. I haven't yet designed
the right way to do this correctly' but happy to do so if anyone else
wants the capability.
[This is an archived TTS discussion thread from discourse.mozilla.org/t/pre-specified-symbol-embeddings]
Beta Was this translation helpful? Give feedback.
All reactions