Where's the GGUF version gone?

#1
by nulled - opened

I think there was a GGUF version but can't find anymore. Is something wrong with that version?

Yes it was broken, unusable. I've not yet figured out how to make a working version. I'll need to raise it to the llama.cpp team but haven't had time yet

Any luck yet?

Someone from Reddit has posted a quant that works with llama.cpp here: https://huggingface.co/imi2/airoboros-180b-2.2.1-gguf
Just make sure you're running the latest version of llama.cpp and follow the instructions for merging the files.
Here's the command I use to run it:
./server --model models/airoboros-180b-2.2.1-Q5_K_M.gguf --n-gpu-layers 128 --ctx-size 4090 --port 5005 --host 0.0.0.0 --parallel 1 --cont-batching --threads 24

Sign up or log in to comment