neuralmagic
/

Meta-Llama-3.1-405B-Instruct-FP8

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Lin-K76 commited on Aug 9

Commit

4ac47b2

•

1 Parent(s): 492ab8b

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -24,7 +24,7 @@ language:
 - **License(s):** [llama3.1](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B/blob/main/LICENSE)
 - **Model Developers:** Neural Magic
-Quantized version of [Meta-Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct).
 It achieves an average score of 86.39 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark (version 1), whereas the unquantized model achieves 86.57.
 ### Model Optimizations

 - **License(s):** [llama3.1](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B/blob/main/LICENSE)
 - **Model Developers:** Neural Magic
+Quantized version of [Meta-Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct) with the updated 8 kv-heads.
 It achieves an average score of 86.39 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark (version 1), whereas the unquantized model achieves 86.57.
 ### Model Optimizations