Discussion: https://www.reddit.com/r/LocalLLaMA/comments/1hrb1hp/a_new_microsoft_paper_lists_sizes_for_most_of_the/
https://explodingtopics.com/blog/gpt-parameters
OpenAI
GPT-4
1.76T - 8x220B MoE
- https://semianalysis.com/2023/07/10/gpt-4-architecture-infrastructure/
- https://www.reddit.com/r/MachineLearning/comments/1bi16pg/d_same_param_count_for_gpt4_from_nvidia_gtc24_as/
- https://x.com/soumithchintala/status/1671267150101721090
GPT-4o
https://techcrunch.com/2024/07/18/openai-unveils-gpt-4o-mini-a-small-ai-model-powering-chatgpt/
Claude 3
20B, 70B, 2T? Dense Sparse Transformer Claude 3.5 Sonnet - 175B?
Gemini
1.5 Pro - 1.3T