We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
For requests to endpoints (i sended only to chat/completions), auto_max_new_tokens is somehow always True, even if it is False
if state['auto_max_new_tokens']: # always True. generation_reply_HF
API models doesn't support max_new_tokens params. This greatly affects the operation of the model via the API.
Function getting
generate_reply_HF {'max_new_tokens': 8028, 'temperature': 1, 'temperature_last': False, 'dynamic_temperature': False, 'dynatemp_low': 1, 'dynatemp_high': 1, 'dynatemp_exponent': 1, 'smoothing_factor': 0, 'smoothing_curve': 1, 'top_p': 1, 'min_p': 0.05, 'top_k': 0, 'repetition_penalty': 1, 'presence_penalty': 0, 'frequency_penalty': 0, 'repetition_penalty_range': 1024, 'typical_p': 1, 'tfs': 1, 'top_a': 0, 'guidance_scale': 1, 'penalty_alpha': 0, 'mirostat_mode': 0, 'mirostat_tau': 5, 'mirostat_eta': 0.1, 'do_sample': True, 'encoder_repetition_penalty': 1, 'no_repeat_ngram_size': 0, 'dry_multiplier': 0, 'dry_base': 1.75, 'dry_allowed_length': 2, 'dry_sequence_breakers': '"\\n", ":", "\\"", "*"', 'sampler_priority': ['temperature', 'dynamic_temperature', 'quadratic_sampling', 'top_k', 'top_p', 'typical_p', 'epsilon_cutoff', 'eta_cutoff', 'tfs', 'top_a', 'min_p', 'mirostat'], 'use_cache': True, 'inputs': ..., 'eos_token_id': [1], 'stopping_criteria': [<modules.callbacks._StopEverythingStoppingCriteria object at 0x000001B5B9F4E110>, <modules.callbacks.Stream object at 0x000001B5B9F4E590>], 'logits_processor': []} <bos><start_of_turn>user
Sending params: {'preset': 'min_p', 'min_p': 0.05, 'dynamic_temperature': False, 'dynatemp_low': 1, 'dynatemp_high': 1, 'dynatemp_exponent': 1, 'smoothing_factor': 0, 'smoothing_curve': 1, 'top_k': 0, 'repetition_penalty': 1, 'repetition_penalty_range': 1024, 'typical_p': 1, 'tfs': 1, 'top_a': 0, 'epsilon_cutoff': 0, 'eta_cutoff': 0, 'guidance_scale': 1, 'negative_prompt': '', 'penalty_alpha': 0, 'mirostat_mode': 0, 'mirostat_tau': 5, 'mirostat_eta': 0.1, 'temperature_last': False, 'do_sample': True, 'seed': -1, 'encoder_repetition_penalty': 1, 'no_repeat_ngram_size': 0, 'dry_multiplier': 0, 'dry_base': 1.75, 'dry_allowed_length': 2, 'dry_sequence_break': '\\n, :, ", *', 'truncation_length': 0, 'max_tokens_second': 0, 'prompt_lookup_num_toknes': 0, 'custom_token_bans': '', 'sampler_priority': ['temperature', 'dynamic_temperature', 'quadratic_sampling', 'top_k', 'top_p', 'typical_p', 'epsilon_cutoff', 'eta_cutofftfs', 'top_a', 'min_p', 'mirostat'], 'auto_max_new_tokens': False, 'ban_eos_token': False, 'add_bos_token': True, 'skip_special_tokens': True, 'grammar_string': '', 'model': '', 'prompt': '', 'best_of': 1, 'echo': False, 'frequency_penalty': 0, 'logit_bias': {}, 'logprobs': None, 'max_tokens': 0, 'n': 1, 'presence_penalty': 0, 'stop': [']'], 'stream': False, 'suffix': ']', 'temperature': 1, 'top_p': 1, 'messages': [{}], 'mode': 'chat-instruct', 'character': 'Alice', 'user_name': 'Bob', 'user_bio': "I'm Bob. 18 years old", 'chat_template_str': "{%- for message in messages %}\n {%- if message['role'] == 'system' -%}\n {%- if message['content'] -%}\n {{- message['content'] + '\n\n' -}}\n {%- endif -%}\n {%- if user_bio -%}\n {{- user_bio + '\n\n' -}}\n {%- endif -%}\n {%- else -%}\n {%- if message['role'] == 'user' -%}\n {{- name1 + ': ' + message['content'] + '\n'-}}\n {%- else -%}\n {{- name2 + ': ' + message['content'] + '\n' -}}\n {%- endif -%}\n {%- endif -%}\n{%- endfor -%}", 'chat_instruct_command': None, 'continue_': False}
{'preset': 'min_p', 'min_p': 0.05, 'dynamic_temperature': False, 'dynatemp_low': 1, 'dynatemp_high': 1, 'dynatemp_exponent': 1, 'smoothing_factor': 0, 'smoothing_curve': 1, 'top_k': 0, 'repetition_penalty': 1, 'repetition_penalty_range': 1024, 'typical_p': 1, 'tfs': 1, 'top_a': 0, 'epsilon_cutoff': 0, 'eta_cutoff': 0, 'guidance_scale': 1, 'negative_prompt': '', 'penalty_alpha': 0, 'mirostat_mode': 0, 'mirostat_tau': 5, 'mirostat_eta': 0.1, 'temperature_last': False, 'do_sample': True, 'seed': -1, 'encoder_repetition_penalty': 1, 'no_repeat_ngram_size': 0, 'dry_multiplier': 0, 'dry_base': 1.75, 'dry_allowed_length': 2, 'dry_sequence_break': '\\n, :, ", *', 'truncation_length': 0, 'max_tokens_second': 0, 'prompt_lookup_num_toknes': 0, 'custom_token_bans': '', 'sampler_priority': ['temperature', 'dynamic_temperature', 'quadratic_sampling', 'top_k', 'top_p', 'typical_p', 'epsilon_cutoff', 'eta_cutofftfs', 'top_a', 'min_p', 'mirostat'], 'auto_max_new_tokens': False, 'ban_eos_token': False, 'add_bos_token': True, 'skip_special_tokens': True, 'grammar_string': '', 'model': '', 'prompt': '', 'best_of': 1, 'echo': False, 'frequency_penalty': 0, 'logit_bias': {}, 'logprobs': None, 'max_tokens': 0, 'n': 1, 'presence_penalty': 0, 'stop': [']'], 'stream': False, 'suffix': ']', 'temperature': 1, 'top_p': 1, 'messages': [{}], 'mode': 'chat-instruct', 'character': 'Alice', 'user_name': 'Bob', 'user_bio': "I'm Bob. 18 years old", 'chat_template_str': "{%- for message in messages %}\n {%- if message['role'] == 'system' -%}\n {%- if message['content'] -%}\n {{- message['content'] + '\n\n' -}}\n {%- endif -%}\n {%- if user_bio -%}\n {{- user_bio + '\n\n' -}}\n {%- endif -%}\n {%- else -%}\n {%- if message['role'] == 'user' -%}\n {{- name1 + ': ' + message['content'] + '\n'-}}\n {%- else -%}\n {{- name2 + ': ' + message['content'] + '\n' -}}\n {%- endif -%}\n {%- endif -%}\n{%- endfor -%}", 'chat_instruct_command': None, 'continue_': False}
No response
Win 11 x64
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Describe the bug
For requests to endpoints (i sended only to chat/completions), auto_max_new_tokens is somehow always True, even if it is False
API models doesn't support max_new_tokens params. This greatly affects the operation of the model via the API.
Function getting
generate_reply_HF {'max_new_tokens': 8028, 'temperature': 1, 'temperature_last': False, 'dynamic_temperature': False, 'dynatemp_low': 1, 'dynatemp_high': 1, 'dynatemp_exponent': 1, 'smoothing_factor': 0, 'smoothing_curve': 1, 'top_p': 1, 'min_p': 0.05, 'top_k': 0, 'repetition_penalty': 1, 'presence_penalty': 0, 'frequency_penalty': 0, 'repetition_penalty_range': 1024, 'typical_p': 1, 'tfs': 1, 'top_a': 0, 'guidance_scale': 1, 'penalty_alpha': 0, 'mirostat_mode': 0, 'mirostat_tau': 5, 'mirostat_eta': 0.1, 'do_sample': True, 'encoder_repetition_penalty': 1, 'no_repeat_ngram_size': 0, 'dry_multiplier': 0, 'dry_base': 1.75, 'dry_allowed_length': 2, 'dry_sequence_breakers': '"\\n", ":", "\\"", "*"', 'sampler_priority': ['temperature', 'dynamic_temperature', 'quadratic_sampling', 'top_k', 'top_p', 'typical_p', 'epsilon_cutoff', 'eta_cutoff', 'tfs', 'top_a', 'min_p', 'mirostat'], 'use_cache': True, 'inputs': ..., 'eos_token_id': [1], 'stopping_criteria': [<modules.callbacks._StopEverythingStoppingCriteria object at 0x000001B5B9F4E110>, <modules.callbacks.Stream object at 0x000001B5B9F4E590>], 'logits_processor': []} <bos><start_of_turn>user
Sending params:
{'preset': 'min_p', 'min_p': 0.05, 'dynamic_temperature': False, 'dynatemp_low': 1, 'dynatemp_high': 1, 'dynatemp_exponent': 1, 'smoothing_factor': 0, 'smoothing_curve': 1, 'top_k': 0, 'repetition_penalty': 1, 'repetition_penalty_range': 1024, 'typical_p': 1, 'tfs': 1, 'top_a': 0, 'epsilon_cutoff': 0, 'eta_cutoff': 0, 'guidance_scale': 1, 'negative_prompt': '', 'penalty_alpha': 0, 'mirostat_mode': 0, 'mirostat_tau': 5, 'mirostat_eta': 0.1, 'temperature_last': False, 'do_sample': True, 'seed': -1, 'encoder_repetition_penalty': 1, 'no_repeat_ngram_size': 0, 'dry_multiplier': 0, 'dry_base': 1.75, 'dry_allowed_length': 2, 'dry_sequence_break': '\\n, :, ", *', 'truncation_length': 0, 'max_tokens_second': 0, 'prompt_lookup_num_toknes': 0, 'custom_token_bans': '', 'sampler_priority': ['temperature', 'dynamic_temperature', 'quadratic_sampling', 'top_k', 'top_p', 'typical_p', 'epsilon_cutoff', 'eta_cutofftfs', 'top_a', 'min_p', 'mirostat'], 'auto_max_new_tokens': False, 'ban_eos_token': False, 'add_bos_token': True, 'skip_special_tokens': True, 'grammar_string': '', 'model': '', 'prompt': '', 'best_of': 1, 'echo': False, 'frequency_penalty': 0, 'logit_bias': {}, 'logprobs': None, 'max_tokens': 0, 'n': 1, 'presence_penalty': 0, 'stop': [']'], 'stream': False, 'suffix': ']', 'temperature': 1, 'top_p': 1, 'messages': [{}], 'mode': 'chat-instruct', 'character': 'Alice', 'user_name': 'Bob', 'user_bio': "I'm Bob. 18 years old", 'chat_template_str': "{%- for message in messages %}\n {%- if message['role'] == 'system' -%}\n {%- if message['content'] -%}\n {{- message['content'] + '\n\n' -}}\n {%- endif -%}\n {%- if user_bio -%}\n {{- user_bio + '\n\n' -}}\n {%- endif -%}\n {%- else -%}\n {%- if message['role'] == 'user' -%}\n {{- name1 + ': ' + message['content'] + '\n'-}}\n {%- else -%}\n {{- name2 + ': ' + message['content'] + '\n' -}}\n {%- endif -%}\n {%- endif -%}\n{%- endfor -%}", 'chat_instruct_command': None, 'continue_': False}
Is there an existing issue for this?
Reproduction
Screenshot
No response
Logs
System Info
The text was updated successfully, but these errors were encountered: