📖 llm-tracker

Search

SearchSearch
      • 2024-01
      • 2024-05
      • 2024-06
      • 2024-08-05
      • DeepSeek-V3 and DeepSeek-R1
      • llama.cpp on CPU
      • Obsidian Plugins
      • Power Usage and Energy Efficiency
      • Proprietary LLM Sizes
      • RTX 3090 vs 7900 XTX Comparison
      • Strix Halo
      • Token Pricing
      • torchtune performance testing
      • UI Clients
      • Video
      • vLLM MI300X vs H100
      • vLLM on RDNA3
      • Code Evaluation
      • EvalPlus
      • List of Evals
      • MT-Bench
      • Speech-to-Text
        • Airoboros LMoE
        • AMD GPUs
        • Apple Silicon Macs
        • Benchmarking LLM Speed
        • ChatGPT Code Interpreter
        • ExLlamaV2
        • Getting Started
        • GPU Monitoring
        • GPU Rentals
        • Hardware
        • Image Tools
        • Inference Speed
        • Inferencing Engines
        • Intel GPUs
        • llama.cpp
        • ML Workflow Tips
        • MLC LLM
        • MT-Bench
        • Nvidia GPUs
        • OmniQuant
        • OpenAI API Compatibility
        • Prompting
        • Qwen Fine Tune
        • Replit Models
        • StyleTTS 2 Setup Guide
        • Unsloth
        • _TODO
        • 8xV100-16 vs 8xL4-24
        • Chatbot UI
        • Comparing Quants
        • fastchat2
        • Fine Tuning Mistral
        • HuggingFace Uploading Models
        • JA Dataset Cleaning
        • MT-Bench Analysis
        • torchtune
        • Transcription Test
        • Best Models
        • QUEUE
        • Small Models
        • Alternative Architectures
        • API
        • Code Assistants
        • Colophon
        • Hallucinations
        • Improving LLM Quality
        • Interpretability
        • Japanese LLMs
        • Learning Resources
        • Lists of Models
        • Memory Editing
        • Mixtral
        • Mixture of Experts
        • Other Tools
        • Prompt Injection Protection
        • Quantization Overview
        • RAG
        • RyzenAI
        • S-LoRA
        • SOTA
        • Speed
        • State of AI
        • Translation
        • Use Cases
        • UX
      • 2025 Getting Started Pack
      • About
      • Deep Research
      • DeepSeek-V3 Architecture
      • DeepSeek-V3 Testing
      • GPU Comparison
      • How to do LLM Benchmarking
      • LLM ChangeMyView
      • MI300X Testing
      • Notes
      • Qwen 3 Testing
      • Reading Lists
      • Review of Coding Tools
      • SD Benchmarking
      • SYCL for AMD
      • TGI
      • Untitled
      • Untitled 1
      • Untitled 2
      • Untitled 3
      • Untitled 4
      • vLLM Benchmark serving
      • W7900 Pervasive Computing Project
    Home

    ❯

    logbook

    ❯

    Comparing Quants

    Comparing Quants

    May 14, 2024, 1 min read

    https://github.com/mit-han-lab/smoothquant https://neuralmagic.com/blog/fast-llama-2-on-cpus-with-sparse-fine-tuning-and-deepsparse/

    Future Project

    • for different quants

    Perplexity https://oobabooga.github.io/blog/posts/perplexities/ https://oobabooga.github.io/blog/posts/gptq-awq-exl2-llamacpp/ https://www.reddit.com/r/LocalLLaMA/comments/145tf00/analysis_of_sizeperplexity_tradeoff_for/ https://www.reddit.com/r/LocalLLaMA/comments/16nmyqq/apples_to_apples_comparison_for_quantizations_of/ https://www.reddit.com/r/LocalLLaMA/comments/13l0j7m/a_comparative_look_at_ggml_quantization_and/

    KL Divergence https://www.reddit.com/r/LocalLLaMA/comments/1816h1x/how_much_does_quantization_actually_impact_models/

    Performance Benchmarking

    • HumanEval
    • lm-eval-harness
    • https://www.reddit.com/r/LocalLLaMA/comments/13yehfn/new_quantization_method_awq_outperforms_gptq_in/

    GPTQ AWQ K-Quant EXL2 Squeeze Omniquant QuIP SpQR HQQ

    BitNet

    Q3_K_M is 3.91 bpw

    Q4_K_M is 4.85bpw

    Q5_K_M is 5.69 bpw Q6_K is 6.59 bpw Q8_0 is 8.50 bpw

    Backlinks

    • No backlinks found

    Created with Quartz © 2025