You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This modification is due to the use of pre-allocated KV cache to optimize the efficiency of the base model (this part of the code refers to Medusa). In the cat operation at
the key and value of the current token have already been cached into past_key_value, so there is no need to return the key and value of the current token for operations outside the model. This modification itself will not affect model performance, but if you do not reset the length attribute of the KV cache after a generation, it will result in abnormal generation.
Hello Eagle Team!
I noticed you modified past_key_value in
EAGLE/eagle/model/modeling_llama_kv.py
Line 594 in 667ba93
by setting it to None in forward function, comparing with the source code
https://github.com/huggingface/transformers/blob/e51d7ac70ab8f3e69d3659226aa838308a668238/src/transformers/models/llama/modeling_llama.py#L324
Could you provide some insights why you made such changes? I am trying to generating responses with code-llama-7b with EAGLE's KVLlamaForCausalLM class, but the results are much lower quality than results I got with default AutoModelForCausalLM class. I suspect the kv cache affects the generation.
The text was updated successfully, but these errors were encountered: