📖 llm-tracker

Search

SearchSearch
      • 2024-01
      • 2024-05
      • 2024-06
      • 2024-08-05
      • DeepSeek-V3 and DeepSeek-R1
      • llama.cpp on CPU
      • Obsidian Plugins
      • Power Usage and Energy Efficiency
      • Proprietary LLM Sizes
      • RTX 3090 vs 7900 XTX Comparison
      • Strix Halo
      • Token Pricing
      • torchtune performance testing
      • UI Clients
      • Video
      • vLLM MI300X vs H100
      • vLLM on RDNA3
      • Code Evaluation
      • EvalPlus
      • List of Evals
      • MT-Bench
      • Speech-to-Text
        • Airoboros LMoE
        • AMD GPUs
        • Apple Silicon Macs
        • Benchmarking LLM Speed
        • ChatGPT Code Interpreter
        • ExLlamaV2
        • Getting Started
        • GPU Monitoring
        • GPU Rentals
        • Hardware
        • Image Tools
        • Inference Speed
        • Inferencing Engines
        • Intel GPUs
        • llama.cpp
        • ML Workflow Tips
        • MLC LLM
        • MT-Bench
        • Nvidia GPUs
        • OmniQuant
        • OpenAI API Compatibility
        • Prompting
        • Qwen Fine Tune
        • Replit Models
        • StyleTTS 2 Setup Guide
        • Unsloth
        • _TODO
        • 8xV100-16 vs 8xL4-24
        • Chatbot UI
        • Comparing Quants
        • fastchat2
        • Fine Tuning Mistral
        • HuggingFace Uploading Models
        • JA Dataset Cleaning
        • MT-Bench Analysis
        • torchtune
        • Transcription Test
        • Best Models
        • QUEUE
        • Small Models
        • Alternative Architectures
        • API
        • Code Assistants
        • Colophon
        • Hallucinations
        • Improving LLM Quality
        • Interpretability
        • Japanese LLMs
        • Learning Resources
        • Lists of Models
        • Memory Editing
        • Mixtral
        • Mixture of Experts
        • Other Tools
        • Prompt Injection Protection
        • Quantization Overview
        • RAG
        • RyzenAI
        • S-LoRA
        • SOTA
        • Speed
        • State of AI
        • Translation
        • Use Cases
        • UX
      • 2025 Getting Started Pack
      • About
      • Deep Research
      • DeepSeek-V3 Architecture
      • DeepSeek-V3 Testing
      • GPU Comparison
      • How to do LLM Benchmarking
      • LLM ChangeMyView
      • MI300X Testing
      • Notes
      • Qwen 3 Testing
      • Reading Lists
      • Review of Coding Tools
      • SD Benchmarking
      • SYCL for AMD
      • TGI
      • Untitled
      • Untitled 1
      • Untitled 2
      • Untitled 3
      • Untitled 4
      • vLLM Benchmark serving
      • W7900 Pervasive Computing Project
    Home

    ❯

    howto

    ❯

    Inferencing Engines

    Inferencing Engines

    Jan 18, 2024, 1 min read

    Comparisons here:

    • https://github.com/lapp0/lm-inference-engines/

    See also:

    • MLC-LLM
    • ExLlamaV2
    • gpt-fast

    2024-01-09 TGI Inferencing Cost

    • https://www.reddit.com/r/LocalLLaMA/comments/192silz/llm_comparison_using_tgi_mistral_falcon7b/
    • https://blog.salad.com/llm-comparison-tgi-benchmark/

    Backlinks

    • No backlinks found

    Created with Quartz © 2025