BramVanroy
/

GEITje-7B-ultra-GGUF

@@ -21,53 +21,30 @@ datasets:
 <em>A conversational model for Dutch, aligned through AI feedback.</em>
 </div>
-This is a `Q5_K_M` GGUF version of [BramVanroy/GEITje-7B-ultra](https://huggingface.co/BramVanroy/GEITje-7B-ultra), a powerful Dutch chatbot, which ultimately is Mistral-based model, further pretrained on Dutch and additionally treated with supervised-finetuning and DPO alignment. For more information on the model, data, licensing, usage, see the main model's README.
-## Usage
-### LM Studio
-You can use this model in [LM Studio](https://lmstudio.ai/), an easy-to-use interface to locally run optimized models. Simply search for `BramVanroy/GEITje-7B-ultra-GGUF`, and download the available file.
-### Ollama
-The model is available on `ollama` and can be easily run as follows:
-```shell
-ollama run bramvanroy/geitje-7b-ultra-gguf
 ```
-To reproduce, i.e. to create the ollama files manually instead of downloading them via ollama, follow the next steps.
-First download the [GGUF file](https://huggingface.co/BramVanroy/GEITje-7B-ultra-GGUF/resolve/main/GEITje-7B-ultra-Q5_K_M.gguf?download=true) and [Modelfile](https://huggingface.co/BramVanroy/GEITje-7B-ultra-GGUF/resolve/main/Modelfile?download=true) to your computer. You can adapt the Modelfile as you wish.
-Then, create the ollama model and run it.
-```shelll
-ollama create geitje-7b-ultra-gguf -f ./Modelfile
-ollama run geitje-7b-ultra-gguf
 ```
-## Reproduce this GGUF version from the non-quantized model
-Assuming you have installed and build llama cpp, current working directory is the `build` directory in llamacpp.
-Download initial model (probaby a huggingface-cli alternative exists, too...)
-```python
-from huggingface_hub import snapshot_download
-model_id = "BramVanroy/GEITje-7B-ultra"
-snapshot_download(repo_id=model_id, local_dir="geitje-ultra-hf", local_dir_use_symlinks=False)
-```
-Convert to GGML format
-```shell
-# Convert to GGML format
-python convert.py build/geitje-ultra-hf/
-cd build
-# Quantize to Q5_K_M
-bin/quantize geitje-ultra-hf/ggml-model-f32.gguf geitje-ultra-hf/GEITje-7B-ultra-Q5_K_M.gguf Q5_K_M
-```

 <em>A conversational model for Dutch, aligned through AI feedback.</em>
 </div>
+This is a  GGUF version of [BramVanroy/GEITje-7B-ultra](https://huggingface.co/BramVanroy/GEITje-7B-ultra), a powerful Dutch chatbot, which ultimately is Mistral-based model, further pretrained on Dutch and additionally treated with supervised-finetuning and DPO alignment. For more information on the model, data, licensing, usage, see the main model's README.
+Available quantization types and expected performance differences compared to base `f16`, higher perplexity=worse (from llama.cpp):
 ```
+Q3_K_M  :  3.07G, +0.2496 ppl @ LLaMA-v1-7B
+Q4_K_M  :  3.80G, +0.0532 ppl @ LLaMA-v1-7B
+Q5_K_M  :  4.45G, +0.0122 ppl @ LLaMA-v1-7B
+Q6_K    :  5.15G, +0.0008 ppl @ LLaMA-v1-7B
+Q8_0    :  6.70G, +0.0004 ppl @ LLaMA-v1-7B
+F16     : 13.00G              @ 7B
 ```
+Also available on [ollama](https://ollama.com/bramvanroy/geitje-7b-ultra).
+Quants were made with release [`b2777`](https://github.com/ggerganov/llama.cpp/releases/tag/b2777) of llama.cpp.
+## Usage
+### LM Studio
+You can use this model in [LM Studio](https://lmstudio.ai/), an easy-to-use interface to locally run optimized models. Simply search for `BramVanroy/GEITje-7B-ultra-GGUF`, and download the available file.
+### Ollama
+The model is available on [`ollama`](https://ollama.com/bramvanroy/geitje-7b-ultra).