Getting Started
If you’re starting from nothing. Just go to Wikipedia and start reading:
- https://en.wikipedia.org/wiki/Large_language_model
- https://en.wikipedia.org/wiki/Foundation_models
- https://en.wikipedia.org/wiki/Artificial_neural_network
- https://en.wikipedia.org/wiki/Machine_learning
- etc, just keep reading
You can use an LLM to help you summarize and better understand/query, although I would not use anything less than ChatGPT 4 or ChatGPT/with Web Browsing (or Bing Chat, which is also designed to do retrieval-augmented replies) to minimize hallucinations. This has the added benefit of getting better first hand experiences of what LLMs can do well (or poorly).
There are plenty of resource lists for research:
- https://github.com/Hannibal046/Awesome-LLM - a good list of fundamental papers to read
- https://github.com/Mooler0410/LLMsPracticalGuide - another good resource that is a little better organized IMO and supports a survey paper https://arxiv.org/abs/2304.13712 of LLMs and their applications
- Understanding Large Language Models — A Transformative Reading List, Sebastian Raschka - rather than just a list of papers, it also has a short description of why the linked papers are important.
- Anti-hype LLM reading list - a nice condensed reading list
- A Hacker’s Guide to Language Models - 1.5h video by Jeremy Howard (fast.ai) w/ accompanying ipynb - fast.ai also has a free Practical Deep Learning course.
Announcements
For a layperson wanting to learn more, I actually think that reading the various announcements on models (and using an LLM to interrogate parts you don’t understand) are probably a decent way to get started. You might not understand everything, but they start give you the “flavor-text” so to speak of AI attributes, keywords, etc:
- OpenAI GPT-4
- Meta LLaMA
- Cerebras-GPT
- MosaicML MPT-30B
- TII Falcon
- Salesforce XGen
- OpenOrca
Reading the announcements and model cards for models (and looking up what you don’t understand) is a great way to get up to speed fast.
Basics
Overview
- https://wandb.ai/mostafaibrahim17/ml-articles/reports/An-Overview-of-Large-Language-Models-LLMs---VmlldzozODA3MzQz
- https://wandb.ai/mostafaibrahim17/ml-articles/reports/An-Introduction-to-Transformer-Networks—VmlldzoyOTE2MjY1
Foundational Models
- https://research.ibm.com/blog/what-are-foundation-models
- https://blogs.nvidia.com/blog/2023/03/13/what-are-foundation-models/
Context Window
- https://www.linkedin.com/pulse/whats-context-window-anyway-caitie-doogan-phd/
- https://blog.gopenai.com/how-to-speed-up-llms-and-use-100k-context-window-all-tricks-in-one-place-ffd40577b4c
- https://blog.langchain.dev/auto-evaluation-of-anthropic-100k-context-window/
Learn From Scratch
For programmers who don’t know ML, it may be easier to learn by doing:
- GPT in 60 Lines of NumPy
- Discussion: https://news.ycombinator.com/item?id=34726115
- Neural Networks: Zero to Hero
- https://www.youtube.com/watch?v=kCc8FmEb1nY
- https://github.com/karpathy/minGPT
- From Transformer to LLM: Architecture, Training and Usage
Structured Courses
Deep Dive Explanations
- https://jalammar.github.io/illustrated-transformer/
- Attn: Illustrated Attention, Raimi Karim
- Transformers Explained Visually, Ketan Doshi
- Transformers Explained Visually (Part 1): Overview of Functionality
- Transformers Explained Visually (Part 2): How it works, step-by-step
- Transformers Explained Visually (Part 3): Multi-head Attention, deep dive
- A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using Hugging Face Transformers, Accelerate and bitsandbytes
- LLM Parameter Counting
- A Conceptual Guide to Transformers
- Part I: https://benlevinstein.substack.com/p/a-conceptual-guide-to-transformers
- Part II: https://benlevinstein.substack.com/p/a-conceptual-guide-to-transformers-b70
- Part III: https://benlevinstein.substack.com/p/a-conceptual-guide-to-transformers-024
- Part IV: https://benlevinstein.substack.com/p/how-to-think-about-large-language
- Part V: https://benlevinstein.substack.com/p/whats-going-on-under-the-hood-of
Fine Tuning Guides
- https://erichartford.com/uncensored-models
- https://huggingface.co/blog/stackllama
- https://www.mlexpert.io/machine-learning/tutorials/alpaca-fine-tuning
- https://lightning.ai/pages/community/tutorial/accelerating-llama-with-fabric-a-comprehensive-guide-to-training-and-fine-tuning-llama/
- https://github.com/hiyouga/LLaMA-Efficient-Tuning
- https://github.com/Lightning-AI/lit-llama/blob/main/howto/finetune_lora.md
- https://github.com/Lightning-AI/lit-llama/blob/main/howto/finetune_adapter.md
- https://github.com/zphang/minimal-llama
- https://github.com/OpenGVLab/LLaMA-Adapter
- https://github.com/zetavg/LLaMA-LoRA-Tuner
- https://github.com/artidoro/qlora
Resource Lists
- https://www.reddit.com/r/LocalLLaMA/wiki/models/
- https://gist.github.com/rain-1/eebd5e5eb2784feecf450324e3341c8d
- https://github.com/KennethanCeyer/awesome-llm
- https://github.com/kasperjunge/LLM-Guide
- https://github.com/imaurer/awesome-decentralized-llm
- https://github.com/snehilsanyal/self_learning_llms
Latest Research
arXiv
https://arxiv.org/list/cs.AI/recent
https://arxiv.org/list/cs.LG/recent
AK’s Daily Papers
Papers With Code
Blogs
- Ethan Mollick’s Substack
- https://lilianweng.github.io/
- https://llm-utils.org/Home
- https://yaofu.notion.site/Yao-Fu-s-Blog-b536c3d6912149a395931f1e871370db
- https://vinija.ai/
- https://kaiokendev.github.io/
Misc
- https://www.reddit.com/r/LocalLLaMA/comments/14le4ti/tree_of_thoughts_build_in_opensource_model/
- https://www.reddit.com/r/LocalLLaMA/comments/14fvht9/new_pruning_method_wanda_can_prune_llms_to_50/
- https://github.com/openlm-research/open_llama/issues/40
- https://github.com/openlm-research/open_llama/issues/63 https://github.com/openlm-research/open_llama/issues/65