Advanced Search
Search Results
36 total results found
HOWTO Guides
Step-by-Step guides for running various ML tasks.
Evals
LLMs
Logbook
Inferencing
Running your LLMs locally.
Research
Quantization
Getting Started
Large Language Models (LLM) are a type of generative AI that power chatbot systems like ChatGPT. You can try many of these for free if you've never tried one (although most of this site is aimed at those with more familiarity with these types of systems). Al...
Lists of Models
We will eventually be hosting an API and list of LLMs. In the meantime: Open LLMs The GitHub repo is the currently most actively maintained list of open models (that can be used commercially) along with some other resources awesome-marketing-datasc...
Hardware
Resources on deciding what hardware to use for powering your local LLMs. Relatively maintained resources: Tim Dettmers keeps a relatively up-to-date guide of recommendations: Which GPU(s) to Get for Deep Learning: My Experience and Advice for Using GPUs in...
AMD GPUs
As of August 2023, AMD's ROCm GPU compute software stack is available for Linux or Windows. Linux Testing was done with a Radeon VII (16GB HBM2 VRAM, gfx906) on Arch Linux Officially Supported GPUs for ROCm 5.6 are: Radeon VII, Radeon Pro VII, V620, W6800, and...
Nvidia GPUs
Nvidia GPUs are the most compatible hardware for AI/ML. All of Nvidia's GPUs (consumer and professional) support CUDA, and basically all popular ML libraries and frameworks support CUDA. The biggest limitation of what LLM models you can run will be how much GP...
List of Evals
MosaicML Model Gauntlet - 34 benchmarks in 6 categories HuggingFace Open LLM Leaderboard warning, their MMLU results are wrong, throwing off the whole ranking: https://twitter.com/Francis_YAO_/status/1666833311279517696 LMSys Chatbot Arena Leaderboard ...
Code Evaluation
Running human-eval: https://github.com/abacaj/code-eval
Colophon
This site runs on BookStack, a PHP-based Wiki/documentation software. While there are other documentation generators I considered (mdBook, MkDocs) I wanted something that could allow collaboration/contribution without managing Github pull requests, which didn'...
Replit Models
Replit has trained a very strong 3B parameter code completion foundational model on The Stack. One fine tune beats WizardCoder-15B (StarCoder fine tune) in human-eval, making it probably the strongest open code-completion model as of July 2023. 2023-07-12: Sad...
Improving LLM Quality
Model Architecture Mixture of Experts / Ensemble Zoph, Barret, Irwan Bello, Sameer Kumar, Nan Du, Yanping Huang, Jeff Dean, Noam Shazeer, and William Fedus. “ST-MoE: Designing Stable and Transferable Sparse Expert Models.” arXiv, April 29, 2022. https://doi....
Learning Resources
Getting Started If you're starting from nothing. Just go to Wikipedia and start reading: https://en.wikipedia.org/wiki/Large_language_model https://en.wikipedia.org/wiki/Foundation_models https://en.wikipedia.org/wiki/Artificial_neural_network https://...
ChatGPT Code Interpreter
In beta for a several months, OpenAI made the Code Interpreter available to all ChatGPT Plus users starting the week of July 10, 2023: https://twitter.com/OpenAI/status/1677015057316872192 This is an extremely powerful tool for both programmers and non-program...
Apple Silicon Macs
Macs are popular with (non-ML) developers, and the combination of (potentially) large amounts of unified GPU memory and decent memory bandwidth are appealing. llama.cpp started as a project to run inference of LLaMA models on Apple Silicon (CPUs). For non-tech...
Code Assistants
We'll probably move this somewhere else, but I figure it might be useful to put this in public somewhere since I'm researching Coding Assistants to help w/ a refactor of a largish code base. I'm looking for practical tools for production use here, and less of ...
llama.cpp
llama.cpp is the most popular backend for inferencing Llama models for single users. Started out for CPU, but now supports GPUs, including best-in-class CUDA performance, and recently, ROCm support. It also has fallback CLBlast support, but performance on t...
Performance
2023-08-14 Aman Sanger (cursor.so) comparing high batch throughput 2023-08-11 Optimizing latency mlc, ctranslate2, vllm, tgi A6000 batch 1 but focused on serving 2023-08-09 [Survey] Supported Hardwares and Speed MLC LLM speeds for all their har...
Transcription Test
This project was done 2023-08-21. Code checked in here: https://github.com/AUGMXNT/transcribe Here's the README which is basically a full writeup of a simple test to try to transcribe and use a variety of LLMs to lightly edit the output: transcribe This is a s...
Speech-to-Text
WhisperX WhisperX is the current best version of Whisper. conda create --name whisperx python=3.10 conda activate whisperx pip install git+https://github.com/m-bain/whisperx.git # Test File wget https://github.com/ggerganov/whisper.cpp/blob/master/samples/jfk...
Airoboros LMoE
Here we experiment w/ getting a local mixture of experts. Released 2023-08-23: https://x.com/jon_durbin/status/1694360998797250856 Code: https://github.com/jondurbin/airoboros#lmoe Setup # env conda create -n airoboros mamba env config vars set CUDA_VISIBLE_DE...
Quantization Overview
How does quantisation affect model output? - 15 basic tests on different quant levels EXL2 (ExLlamaV2) https://github.com/turboderp/exllamav2 Based off of GPTQ but iteratively selects from quants vs calibration and averages bit depth to target an arbitr...