📖 llm-tracker

Search

❯

❯

Speed

Jan 08, 2024, 1 min read

Speed collection: https://huggingface.co/collections/leonardlin/speed-6583d7a3b02f38ef348139ef

Good recent summary of techniques: https://vgel.me/posts/faster-inference/

https://yaofu.notion.site/Towards-100x-Speedup-Full-Stack-Transformer-Inference-Optimization-43124c3688e14cffaf2f1d6cbdf26c6c

Exponentially Faster Language Modelling

https://arxiv.org/abs/2311.10770
“we provide high-level CPU code achieving 78x speedup over the optimized baseline feedforward implementation, and a PyTorch implementation delivering 40x speedup over the equivalent batched feedforward inference”

Lookahead Decoding https://lmsys.org/blog/2023-11-21-lookahead-decoding/

Medusa https://sites.google.com/view/medusa-llm

Groq https://news.ycombinator.com/item?id=38739199 https://chat.groq.com/ https://groq.com/wp-content/uploads/2023/05/GroqISCAPaper2022_ASoftwareDefinedTensorStreamingMultiprocessorForLargeScaleMachineLearning-1.pdf https://www.bittware.com/products/groq/

https://yaofu.notion.site/Towards-100x-Speedup-Full-Stack-Transformer-Inference-Optimization-43124c3688e14cffaf2f1d6cbdf26c6c

Backlinks

No backlinks found

Created with Quartz © 2025