Getting Started

If you’re starting from nothing. Just go to Wikipedia and start reading:

You can use an LLM to help you summarize and better understand/query, although I would not use anything less than ChatGPT 4 or ChatGPT/with Web Browsing (or Bing Chat, which is also designed to do retrieval-augmented replies) to minimize hallucinations. This has the added benefit of getting better first hand experiences of what LLMs can do well (or poorly).

There are plenty of resource lists for research:

https://github.com/Hannibal046/Awesome-LLM - a good list of fundamental papers to read
https://github.com/Mooler0410/LLMsPracticalGuide - another good resource that is a little better organized IMO and supports a survey paper https://arxiv.org/abs/2304.13712 of LLMs and their applications
Understanding Large Language Models — A Transformative Reading List, Sebastian Raschka - rather than just a list of papers, it also has a short description of why the linked papers are important.
Anti-hype LLM reading list - a nice condensed reading list
A Hacker’s Guide to Language Models - 1.5h video by Jeremy Howard (fast.ai) w/ accompanying ipynb - fast.ai also has a free Practical Deep Learning course.

Announcements

For a layperson wanting to learn more, I actually think that reading the various announcements on models (and using an LLM to interrogate parts you don’t understand) are probably a decent way to get started. You might not understand everything, but they start give you the “flavor-text” so to speak of AI attributes, keywords, etc:

OpenAI GPT-4
Meta LLaMA
Cerebras-GPT
- https://www.cerebras.net/blog/cerebras-gpt-a-family-of-open-compute-efficient-large-language-models/
MosaicML MPT-30B
- https://www.mosaicml.com/blog/mpt-30b
TII Falcon
- https://huggingface.co/blog/falcon
Salesforce XGen
- https://blog.salesforceairesearch.com/xgen/
OpenOrca
- https://erichartford.com/openorca

Reading the announcements and model cards for models (and looking up what you don’t understand) is a great way to get up to speed fast.

Basics

Overview

Foundational Models

Context Window

Learn From Scratch

For programmers who don’t know ML, it may be easier to learn by doing:

Structured Courses

Deep Dive Explanations

https://jalammar.github.io/illustrated-transformer/
Attn: Illustrated Attention, Raimi Karim
Transformers Explained Visually, Ketan Doshi
Transformers Explained Visually (Part 1): Overview of Functionality
Transformers Explained Visually (Part 2): How it works, step-by-step
Transformers Explained Visually (Part 3): Multi-head Attention, deep dive
A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using Hugging Face Transformers, Accelerate and bitsandbytes
LLM Parameter Counting
A Conceptual Guide to Transformers

📖 llm-tracker

Explorer

Learning Resources

Getting Started

Announcements

Basics

Overview

Foundational Models

Context Window

Learn From Scratch

Structured Courses

Deep Dive Explanations

Fine Tuning Guides

Resource Lists

Latest Research

arXiv

AK’s Daily Papers

Papers With Code

Blogs

Misc

Table of Contents

Backlinks