Inferencing
Running your LLMs locally.
AMD GPUs
As of August 2023, AMD's ROCm GPU compute software stack is available for Linux or Windows. Linux...
Nvidia GPUs
Nvidia GPUs are the most compatible hardware for AI/ML. All of Nvidia's GPUs (consumer and profes...
Replit Models
Replit has trained a very strong 3B parameter code completion foundational model on The Stack. On...
Apple Silicon Macs
Macs are popular with (non-ML) developers, and the combination of (potentially) large amounts of ...
Performance
2023-08-14 Aman Sanger (cursor.so) comparing high batch throughput 2023-08-11 Optimizing laten...
Airoboros LMoE
Here we experiment w/ getting a local mixture of experts. Released 2023-08-23: https://x.com/jon_...
llama.cpp
llama.cpp is the most popular backend for inferencing Llama models for single users. Started o...