Skip to main content

Lists of Models

We will eventually be hosting an API and list of LLMs.  In the meantime:

See Evals for potentially more models and how they compare.

lhl's opinionated list of local llms

May move this out if anyone else ever starts contributing and just give a recommended starting list. I don't do RP or require character writing/fiction. My main uses are for factual q&a, and ideally coding/tech support.

Note, llama.cpp just updated to GGUF format, so you should try to find or convert to one of those (or TheBloke may have converted by the time you read this). It should make using non-llama and extended context window models much easier moving forward. GPTQs remain the same.

My general preference for bang/bit quants is either 4-bit 32g actorder=True GPTQ w/ ExLlama or q4_K_M GGML w/ llama.cpp.

Last updated: 2023-08-23

  • Current best local model (any size)
    • All the top ranked models are currently llama2-70b fine tunes. While I didn't try them all, I recently tested most of the top leaderboard models and Pankaj Mathur's Orca Mini V3 did the best at instruction following for a basic text manipulation task.
  • Current best local model for 24GB GPU (eg, 3090, 4090)
    • While the llama2-34b has yet to be released, new llama2-13b models have largely overtaken llama-30b in the leaderboards. Due to their extended (4K vs 2K token) native context window, for most usage, I think the llama2-13bs should probably be preferred. I don't have strong opinions on the "best" models atm, but I'd give a few a try:
  • Current best local model for 16GB GPU or Apple Silicon Mac
    • You can try any llama2-13b fine tune
  • With the release of CodeLlama, the landscape for coding assistants have changed: