📖 llm-tracker

Search

❯

❯

Comparing Quants

Comparing Quants

May 14, 2024, 1 min read

https://github.com/mit-han-lab/smoothquant https://neuralmagic.com/blog/fast-llama-2-on-cpus-with-sparse-fine-tuning-and-deepsparse/

Future Project

for different quants

Perplexity https://oobabooga.github.io/blog/posts/perplexities/ https://oobabooga.github.io/blog/posts/gptq-awq-exl2-llamacpp/ https://www.reddit.com/r/LocalLLaMA/comments/145tf00/analysis_of_sizeperplexity_tradeoff_for/ https://www.reddit.com/r/LocalLLaMA/comments/16nmyqq/apples_to_apples_comparison_for_quantizations_of/ https://www.reddit.com/r/LocalLLaMA/comments/13l0j7m/a_comparative_look_at_ggml_quantization_and/

KL Divergence https://www.reddit.com/r/LocalLLaMA/comments/1816h1x/how_much_does_quantization_actually_impact_models/

Performance Benchmarking

HumanEval
lm-eval-harness
https://www.reddit.com/r/LocalLLaMA/comments/13yehfn/new_quantization_method_awq_outperforms_gptq_in/

GPTQ AWQ K-Quant EXL2 Squeeze Omniquant QuIP SpQR HQQ

BitNet

Q3_K_M is 3.91 bpw

Q4_K_M is 4.85bpw

Q5_K_M is 5.69 bpw Q6_K is 6.59 bpw Q8_0 is 8.50 bpw

Backlinks

No backlinks found

Created with Quartz © 2025