Meta-Llama-3.1-405B-Instruct-FP8 seems to be misconfigured

#534
by rymiel - opened

With a sufficiently long conversation using Meta-Llama-3.1-405B-Instruct-FP8, this error occurs 100% of the time, making the model unusable after a certain point:

Input validation error: inputs tokens + max_new_tokens must be <= 16384. Given: 14337 inputs tokens and 2048 max_new_tokens

For Meta-Llama-3.1-405B-Instruct-FP8, truncate is set to 14337 and max_new_tokens is set to 2048. Added together, these are 16385 tokens, which is 2^14+1. Seems like an off-by-one, it will always fail the check <= 16384

In comparison, for Meta-Llama-3.1-70B-Instruct, truncate is set to 7167 and max_new_tokens is set to 1024. Added together, 8191, which is 2^13-1. Seems like another off-by-one, but in the other direction.

I'm not sure where max_total_tokens is being set, though.

Similar to #430

Hugging Chat org

Thanks for bringing this up, will take a look

Hugging Chat org

Should be fixed! Let me know if you're still having issues

nsarrazin changed discussion status to closed

Sign up or log in to comment