Performance Drop due to quantization?

#34
by Teja-Gollapudi - opened

Hi,
Are there any benchmark comparisions for the Quantized model vs the full model?
I want to gauge the performance drop introduced by quantization.

Thank you!

Teja-Gollapudi changed discussion title from Benchmark comparison to Performance Drop due to quantization?

Did you manage to find any comparison?

Never got around to doing it πŸ˜•.

4 bits are roughly 95 percent as accurate as full precision model

Sign up or log in to comment