BramVanroy commited on
Commit
738aa0e
1 Parent(s): 56e810a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -38
README.md CHANGED
@@ -21,53 +21,30 @@ datasets:
21
  <em>A conversational model for Dutch, aligned through AI feedback.</em>
22
  </div>
23
 
24
- This is a `Q5_K_M` GGUF version of [BramVanroy/GEITje-7B-ultra](https://huggingface.co/BramVanroy/GEITje-7B-ultra), a powerful Dutch chatbot, which ultimately is Mistral-based model, further pretrained on Dutch and additionally treated with supervised-finetuning and DPO alignment. For more information on the model, data, licensing, usage, see the main model's README.
25
 
26
- ## Usage
27
-
28
- ### LM Studio
29
-
30
- You can use this model in [LM Studio](https://lmstudio.ai/), an easy-to-use interface to locally run optimized models. Simply search for `BramVanroy/GEITje-7B-ultra-GGUF`, and download the available file.
31
 
32
- ### Ollama
33
-
34
- The model is available on `ollama` and can be easily run as follows:
35
-
36
- ```shell
37
- ollama run bramvanroy/geitje-7b-ultra-gguf
38
  ```
39
-
40
- To reproduce, i.e. to create the ollama files manually instead of downloading them via ollama, follow the next steps.
41
-
42
- First download the [GGUF file](https://huggingface.co/BramVanroy/GEITje-7B-ultra-GGUF/resolve/main/GEITje-7B-ultra-Q5_K_M.gguf?download=true) and [Modelfile](https://huggingface.co/BramVanroy/GEITje-7B-ultra-GGUF/resolve/main/Modelfile?download=true) to your computer. You can adapt the Modelfile as you wish.
43
-
44
- Then, create the ollama model and run it.
45
-
46
- ```shelll
47
- ollama create geitje-7b-ultra-gguf -f ./Modelfile
48
- ollama run geitje-7b-ultra-gguf
49
  ```
50
 
51
- ## Reproduce this GGUF version from the non-quantized model
52
 
53
- Assuming you have installed and build llama cpp, current working directory is the `build` directory in llamacpp.
54
 
55
- Download initial model (probaby a huggingface-cli alternative exists, too...)
56
 
57
- ```python
58
- from huggingface_hub import snapshot_download
59
- model_id = "BramVanroy/GEITje-7B-ultra"
60
- snapshot_download(repo_id=model_id, local_dir="geitje-ultra-hf", local_dir_use_symlinks=False)
61
- ```
62
 
63
- Convert to GGML format
64
 
65
- ```shell
66
- # Convert to GGML format
67
- python convert.py build/geitje-ultra-hf/
68
 
69
- cd build
70
 
71
- # Quantize to Q5_K_M
72
- bin/quantize geitje-ultra-hf/ggml-model-f32.gguf geitje-ultra-hf/GEITje-7B-ultra-Q5_K_M.gguf Q5_K_M
73
- ```
 
21
  <em>A conversational model for Dutch, aligned through AI feedback.</em>
22
  </div>
23
 
24
+ This is a GGUF version of [BramVanroy/GEITje-7B-ultra](https://huggingface.co/BramVanroy/GEITje-7B-ultra), a powerful Dutch chatbot, which ultimately is Mistral-based model, further pretrained on Dutch and additionally treated with supervised-finetuning and DPO alignment. For more information on the model, data, licensing, usage, see the main model's README.
25
 
26
+ Available quantization types and expected performance differences compared to base `f16`, higher perplexity=worse (from llama.cpp):
 
 
 
 
27
 
 
 
 
 
 
 
28
  ```
29
+ Q3_K_M : 3.07G, +0.2496 ppl @ LLaMA-v1-7B
30
+ Q4_K_M : 3.80G, +0.0532 ppl @ LLaMA-v1-7B
31
+ Q5_K_M : 4.45G, +0.0122 ppl @ LLaMA-v1-7B
32
+ Q6_K : 5.15G, +0.0008 ppl @ LLaMA-v1-7B
33
+ Q8_0 : 6.70G, +0.0004 ppl @ LLaMA-v1-7B
34
+ F16 : 13.00G @ 7B
 
 
 
 
35
  ```
36
 
37
+ Also available on [ollama](https://ollama.com/bramvanroy/geitje-7b-ultra).
38
 
39
+ Quants were made with release [`b2777`](https://github.com/ggerganov/llama.cpp/releases/tag/b2777) of llama.cpp.
40
 
 
41
 
42
+ ## Usage
 
 
 
 
43
 
44
+ ### LM Studio
45
 
46
+ You can use this model in [LM Studio](https://lmstudio.ai/), an easy-to-use interface to locally run optimized models. Simply search for `BramVanroy/GEITje-7B-ultra-GGUF`, and download the available file.
 
 
47
 
48
+ ### Ollama
49
 
50
+ The model is available on [`ollama`](https://ollama.com/bramvanroy/geitje-7b-ultra).