Edit model card

Llamacpp Quantizations of Meta-Llama-3.1-8B

Using llama.cpp release b3583 for quantization.

Original model: https://huggingface.co/google/gemma-7b

Download a file (not the whole branch) from below:

Filename Quant type File Size Perplexity (wikitext-2-raw-v1.test)
gemma-7b.BF16.gguf BF16 17.1 GB 6.9857 +/- 0.04411
gemma-7b-Q8_0.gguf Q8_0 9.08 GB 7.0373 +/- 0.04456
gemma-7b-Q6_K.gguf Q6_K 7.01 GB 7.3858 +/- 0.04762
gemma-7b-Q5_K_M.gguf Q5_K_M 6.14 GB 7.4227 +/- 0.04781
gemma-7b-Q5_K_S.gguf Q5_K_S 5.98 GB 7.5232 +/- 0.04857
gemma-7b-Q4_K_M.gguf Q4_K_M 5.33 GB 7.5800 +/- 0.04918
gemma-7b-Q4_K_S.gguf Q4_K_S 5.05 GB 7.9673 +/- 0.05225
gemma-7b-Q3_K_L.gguf Q3_K_L 4.71 GB 7.9586 +/- 0.05186
gemma-7b-Q3_K_M.gguf Q3_K_M 4.37 GB 8.4077 +/- 0.05545
gemma-7b-Q3_K_S.gguf Q3_K_S 3.98 GB 102.6126 +/- 1.62310
gemma-7b-Q2_K.gguf Q2_K 3.48 GB 3970.5385 +/- 102.46527

Downloading using huggingface-cli

First, make sure you have hugginface-cli installed:

pip install -U "huggingface_hub[cli]"

Then, you can target the specific file you want:

huggingface-cli download fedric95/gemma-7b-GGUF --include "gemma-7b-Q4_K_M.gguf" --local-dir ./

If the model is bigger than 50GB, it will have been split into multiple files. In order to download them all to a local folder, run:

huggingface-cli download fedric95/gemma-7b-GGUF --include "gemma-7b-Q8_0.gguf/*" --local-dir gemma-7b-Q8_0

You can either specify a new local-dir (gemma-7b-Q8_0) or download them all in place (./)

Reproducibility

https://github.com/ggerganov/llama.cpp/discussions/9020#discussioncomment-10335638

Downloads last month
301
GGUF
Model size
8.54B params
Architecture
gemma

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Examples
Inference API (serverless) is not available, repository is disabled.

Model tree for fedric95/gemma-7b-GGUF

Base model

google/gemma-7b
Quantized
this model