is it Double Quantized?

#6
by Respair - opened

Sorry but I'm a bit confused. The original model is around 104B, Cohere did a quantized version using Bitsandbytes.
Did you initiate a yet another quantization on top of the already quantized weights? in other words, will it not completely annihilate the performance of the model if you do that ? or is it just a reformatting?

Did you initiate a yet another quantization on top of the already quantized weights?

No, all quants are made from the original fp16 model including this and cohere's bitsandbytes quants.

or is it just a reformatting?

This is new quant from original weights.

That's a relief, thanks a lot.
the automated calculation of parameters by HF in the repo's page seemed a bit wrong so that's what made me to ask this. i'll close this discussion.

Respair changed discussion status to closed

Sign up or log in to comment