Can a AQLM of CommandR+ be converted into a GGUF for LlamaCPP?

#34
by SabinStargem - opened

Someone made an AQLM of CR+, but I don't know much about the format. Can it be converted, and if so, would it have enough quality?

https://huggingface.co/ISTA-DASLab/c4ai-command-r-plus-AQLM-2Bit-1x16

EDIT: They also made an AQLM of Llama-3 8b Instruct. It might make for a good test case, on account of being smaller.

https://huggingface.co/ISTA-DASLab/Meta-Llama-3-8B-Instruct-AQLM-2Bit-1x16

SabinStargem changed discussion title from Can a AQLM of CommandR+ converted into a GGUF for LlamaCPP? to Can a AQLM of CommandR+ be converted into a GGUF for LlamaCPP?

Theoretically it might be possible to convert it, but why would anybody want to do that? The original model is available, which has much higher quality.

mradermacher changed discussion status to closed

Just to see how AQLM (as a GGUF) compares to traditional quants in real-world use. There aren't many AQLM GGUFs, if any, so it is hard to say whether or not they have value. Going by what the authors have said, AQLM offers a much better tradeoff for larger models. CommandR+ is about 31 gigs with AQLM. That is a bit smaller than the IQ4xs, which is 56 gigs.

Going by what I have heard, the big downside of AQLM is the creation of it in the first place. Apparently it is more a compression method than quantization, requiring lots of computational power to create. What I am hoping for, is that an existing AQLM could be turned into a GGUF for easy use in a backend. No quanting, just conversion.

When converting AQLM to GGUF you lose quality over converting it directly from the original model, so there is little value in doing it unless the original value is not available. AQLM is just another form of quantization, which is a form of compression, so it makes no sense to say it is "more" a compression method. What you should compare is an imatrix quant of comparable size (that would probably be IQ2_XS) and see how it performs against the AQLM quant.

To make a (likely lacking) analogy, what you are asking for is like converting a gif to a jpg, when the original png is available - it's always better to start with the original.

Fair enough.

Sign up or log in to comment