Quantization Overview
How does quantisation affect model output? - 15 basic tests on different quant levels
EXL2 (ExLlamaV2)
- https://github.com/turboderp/exllamav2
- Based off of GPTQ but iteratively selects from quants vs calibration and averages bit depth to target an arbitrary bit-weight
OmniQuant
Here are my docs for how run it: OmniQuant
- https://github.com/OpenGVLab/OmniQuant
- https://arxiv.org/abs/2308.13137
- Better than GPTQ (OPTQ), AWQ, SmoothQuant
- MLC compatible
QUIP
- https://github.com/jerry-chee/QuIP
- https://github.com/AlpinDale/QuIP-for-Llama
- https://arxiv.org/abs/2307.13304
- Not just PPL but also benchmark accuracy tests
- 3-bit almost matches FP16
SqueezeLM
AWQ
- https://github.com/mit-han-lab/llm-awq
- https://arxiv.org/abs/2306.00978
- https://github.com/casper-hansen/AutoAWQ/
GGML k-quants
SmoothQuant
SpQR
GPTQ/OPTQ
No Comments