Files changed (1) hide show
  1. README.md +1 -0
README.md CHANGED
@@ -10,6 +10,7 @@ tags:
10
  This repo contains model files for a 2:4 (N:M) sparse [Meta-Llama-3-8B](meta-llama/Meta-Llama-3-8B) model pruned in one-shot with [SparseGPT](https://arxiv.org/abs/2301.00774), and then additionally retrained with the [SquareHead](https://arxiv.org/abs/2310.06927) knowledge distillation while maintaining the 2:4 sparsity mask.
11
  It was then quantized using [AutoFP8](https://github.com/neuralmagic/AutoFP8) to FP8 weights and activations with per-tensor scales, calibrated on UltraChat2k.
12
 
 
13
 
14
  ## Evaluation Benchmark Results
15
 
 
10
  This repo contains model files for a 2:4 (N:M) sparse [Meta-Llama-3-8B](meta-llama/Meta-Llama-3-8B) model pruned in one-shot with [SparseGPT](https://arxiv.org/abs/2301.00774), and then additionally retrained with the [SquareHead](https://arxiv.org/abs/2310.06927) knowledge distillation while maintaining the 2:4 sparsity mask.
11
  It was then quantized using [AutoFP8](https://github.com/neuralmagic/AutoFP8) to FP8 weights and activations with per-tensor scales, calibrated on UltraChat2k.
12
 
13
+ **Note:** The unquantized [Meta-Llama-3-8B-pruned_50.2of4-FP8](https://huggingface.co/nm-testing/SparseLlama-3-8B-pruned_50.2of4) is still a work in progress and subject to change. This FP8 model will be updated once the unquantized model is updated too.
14
 
15
  ## Evaluation Benchmark Results
16