llama.cpp is the most popular backend for inferencing Llama models for single users.

Started out for CPU, but now supports GPUs, including best-in-class CUDA performance, and recently, ROCm support. It also has fallback CLBlast support, but performance on that is not great.
Has their own GGUF file format (was GGMLv1-3 before) that is a single file metadata container (think MOV or MKV for models). GGUF uses their own custom quant and binary layout. Note, these format changes have always breaking/non-compatible, but maybe GGUF will be flexible enough to not be.
See ggml for other model types (some of which are getting folded into llama.cpp?)
Very active community with dozens of contributors: https://github.com/ggerganov/ggml/graphs/contributors
Many end-user client projects like LMStudio, Ollama, GPT4all, KoboldCPP, etc either use llama.cpp or a fork as their backend.

Mac Performance

Summary

paru -S supabase-bin
sudo supabase start