Can a AQLM of CommandR+ be converted into a GGUF for LlamaCPP?

#34

by SabinStargem - opened Apr 26

Apr 26

•

Someone made an AQLM of CR+, but I don't know much about the format. Can it be converted, and if so, would it have enough quality?

https://huggingface.co/ISTA-DASLab/c4ai-command-r-plus-AQLM-2Bit-1x16

EDIT: They also made an AQLM of Llama-3 8b Instruct. It might make for a good test case, on account of being smaller.

https://huggingface.co/ISTA-DASLab/Meta-Llama-3-8B-Instruct-AQLM-2Bit-1x16

SabinStargem changed discussion title from Can a AQLM of CommandR+ converted into a GGUF for LlamaCPP? to Can a AQLM of CommandR+ be converted into a GGUF for LlamaCPP? Apr 26

mradermacher

Owner Apr 26

Theoretically it might be possible to convert it, but why would anybody want to do that? The original model is available, which has much higher quality.

mradermacher changed discussion status to closed Apr 26

SabinStargem

Apr 26

•

edited Apr 26

Just to see how AQLM (as a GGUF) compares to traditional quants in real-world use. There aren't many AQLM GGUFs, if any, so it is hard to say whether or not they have value. Going by what the authors have said, AQLM offers a much better tradeoff for larger models. CommandR+ is about 31 gigs with AQLM. That is a bit smaller than the IQ4xs, which is 56 gigs.

Going by what I have heard, the big downside of AQLM is the creation of it in the first place. Apparently it is more a compression method than quantization, requiring lots of computational power to create. What I am hoping for, is that an existing AQLM could be turned into a GGUF for easy use in a backend. No quanting, just conversion.

mradermacher

Owner Apr 26

•

edited Apr 26

When converting AQLM to GGUF you lose quality over converting it directly from the original model, so there is little value in doing it unless the original value is not available. AQLM is just another form of quantization, which is a form of compression, so it makes no sense to say it is "more" a compression method. What you should compare is an imatrix quant of comparable size (that would probably be IQ2_XS) and see how it performs against the AQLM quant.

mradermacher

Owner Apr 26

To make a (likely lacking) analogy, what you are asking for is like converting a gif to a jpg, when the original png is available - it's always better to start with the original.

SabinStargem

Apr 26

Fair enough.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment