Skip to main content
Advanced Search
Search Terms
Content Type

Exact Matches
Tag Searches
Date Options
Updated after
Updated before
Created after
Created before

Search Results

29 total results found

Getting Started

HOWTO Guides

Large Language Models (LLM) are a type of generative AI that power chatbot systems like ChatGPT. You can try many of these for free if you've never tried one (although most of this site is aimed at those with more familiarity with these types of systems). Al...

Lists of Models


We will eventually be hosting an API and list of LLMs.  In the meantime: Open LLMs The GitHub repo is the currently most actively maintained list of open models (that can be used commercially) along with some other resources awesome-marketing-datasc...


HOWTO Guides

Resources on deciding what hardware to use for powering your local LLMs. Relatively maintained resources: Tim Dettmers keeps a relatively up-to-date guide of recommendations: Which GPU(s) to Get for Deep Learning: My Experience and Advice for Using GPUs in...


HOWTO Guides Inferencing

As of August 2023, AMD's ROCm GPU compute software stack is available for Linux or Windows. Linux Testing was done with a Radeon VII (16GB HBM2 VRAM, gfx906) on Arch Linux Officially Supported GPUs for ROCm 5.6 are: Radeon VII, Radeon Pro VII, V620, W6800, and...

Nvidia GPUs

HOWTO Guides Inferencing

Nvidia GPUs are the most compatible hardware for AI/ML. All of Nvidia's GPUs (consumer and professional) support CUDA, and basically all popular ML libraries and frameworks support CUDA. The biggest limitation of what LLM models you can run will be how much GP...

List of Evals


MosaicML Model Gauntlet - 34 benchmarks in 6 categories HuggingFace Open LLM Leaderboard warning, their MMLU results are wrong, throwing off the whole ranking: LMSys Chatbot Arena Leaderboard ...

Code Evaluation


Running human-eval:



This site runs on BookStack, a PHP-based Wiki/documentation software. While there are other documentation generators I considered (mdBook, MkDocs) I wanted something that could allow collaboration/contribution without managing Github pull requests, which didn'...

Replit Models

HOWTO Guides Inferencing

Replit has trained a very strong 3B parameter code completion foundational model on The Stack. One fine tune beats WizardCoder-15B (StarCoder fine tune) in human-eval, making it probably the strongest open code-completion model as of July 2023. 2023-07-12: Sad...

Improving LLM Quality

LLMs Research

Model Architecture Mixture of Experts / Ensemble Zoph, Barret, Irwan Bello, Sameer Kumar, Nan Du, Yanping Huang, Jeff Dean, Noam Shazeer, and William Fedus. “ST-MoE: Designing Stable and Transferable Sparse Expert Models.” arXiv, April 29, 2022. https://doi....

Learning Resources

LLMs Research

Getting Started If you're starting from nothing. Just go to Wikipedia and start reading: https://...

ChatGPT Code Interpreter

HOWTO Guides

In beta for a several months, OpenAI made the Code Interpreter available to all ChatGPT Plus users starting the week of July 10, 2023: This is an extremely powerful tool for both programmers and non-program...

Apple Silicon Macs

HOWTO Guides Inferencing

Macs are popular with (non-ML) developers, and the combination of (potentially) large amounts of unified GPU memory and decent memory bandwidth are appealing. llama.cpp started as a project to run inference of LLaMA models on Apple Silicon (CPUs). For non-tech...

Code Assistants


We'll probably move this somewhere else, but I figure it might be useful to put this in public somewhere since I'm researching Coding Assistants to help w/ a refactor of a largish code base. I'm looking for practical tools for production use here, and less of ...


HOWTO Guides Inferencing

llama.cpp is the most popular backend for inferencing Llama models for single users. Started out for CPU, but now supports GPUs, including best-in-class CUDA performance, and recently, ROCm support. It also has fallback CLBlast support, but performance on t...


HOWTO Guides Inferencing

2023-08-14 Aman Sanger ( comparing high batch throughput 2023-08-11 Optimizing latency mlc, ctranslate2, vllm, tgi A6000 batch 1 but focused on serving 2023-08-09 [Survey] Supported Hardwares and Speed MLC LLM speeds for all their har...

Transcription Test


This project was done 2023-08-21. Code checked in here: Here's the README which is basically a full writeup of a simple test to try to transcribe and use a variety of LLMs to lightly edit the output: transcribe This is a s...



WhisperX WhisperX is the current best version of Whisper. conda create --name whisperx python=3.10 conda activate whisperx pip install git+ # Test File wget

Airoboros LMoE

HOWTO Guides Inferencing

Here we experiment w/ getting a local mixture of experts. Released 2023-08-23: Code: Setup # env conda create -n airoboros mamba env config vars set CUDA_VISIBLE_DE...

Quantization Overview

LLMs Quantization

How does quantisation affect model output? - 15 basic tests on different quant levels EXL2 (ExLlamaV2) Based off of GPTQ but iteratively selects from quants vs calibration and averages bit depth to target an arbitr...