You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I can not get short utterances (a couple words) to work without hallucinations at the end, despite my training mix being 50/50 very short and long utterances. Why won't the GPT predict the EOT token correctly if it has seen enough examples already? (1h training data at epoch 46)
Is it due to some batched training optimization that neglects EOT tokens?
To Reproduce
Finetune model on 50/50 very short and long utterances (or sometimes see with pretrained xttsv2 checkpoint and custom speaker latent) and prompt with "Program complete." or something.
I am at epoch 241 and it just gets worse. It hallucinates, even after a 7 word sentence.
There must be something wrong with batched padding or something, I'd appreciate help.
I get hallucinations and slurs at the beginning and end of short phrases. I've trained it on a mix of short (less than 7 words) and long phrases, but it just doesn't like it.
Describe the bug
I can not get short utterances (a couple words) to work without hallucinations at the end, despite my training mix being 50/50 very short and long utterances. Why won't the GPT predict the EOT token correctly if it has seen enough examples already? (1h training data at epoch 46)
Is it due to some batched training optimization that neglects EOT tokens?
To Reproduce
Finetune model on 50/50 very short and long utterances (or sometimes see with pretrained xttsv2 checkpoint and custom speaker latent) and prompt with "Program complete." or something.
Expected behavior
It should cut off after generating the sentence.
Logs
No response
Environment
Additional context
No response
The text was updated successfully, but these errors were encountered: