Last Reviewed | Class of Model | Model | Notes |
---|---|---|---|
2024-01-28 | 3B | stabilityai/stablelm-zephyr-3b | 6.64 MT-Bench Stability NC License |
2024-02-25 | 7B | mlabonne/AlphaMonarch-7B | Current endpoint of mlabonne’s merging experiments. High performing across wide range of benchmarks CC-BY-NC 4.0 |
2024-01-18 | 13B | NousResearch/Nous-Hermes-2-SOLAR-10.7B | Apache 2.0 Hermes tune of SOLAR (layer stacked Mistral 7B) |
2024-01-18 | 4x7B MoE | mlabonne/Beyonder-4x7B-v2 | MS Research License / CC-BY-NC 4.0 A very capable looking model that has 24.2B parameters, but inferences at ~12B (2 experts) |
2024-01-18 | 34B | jondurbin/bagel-dpo-34b-v0.2 | Yi License (Instant Commercial) Reddit Discussion Yi 200K based finetune |
2024-01-18 | 8x7B | NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO | Best performing Mixtral? That’d make this the best performing Apache 2.0 licensed model available. |
2024-01-18 | 70B+ | miqudev/miqu-1-70b | Leaked alpha of mistral-medium fp16 dequant here: 152334H/miqu-1-70b-sf See also this test of various miqu models |
2024-02-25 | 70B+ | abacusai/Smaug-72B-v0.1 | A DPOP tune of Qwen-72B (1.0) - topped the HF leaderboards and scores >80 on MMLU, however haven’t seen much real world reporting. |
2024-02-25 | 70B+ | Qwen/Qwen1.5-72B-Chat | Qwen1.5 72B is currently the highest ranked open model on the LMSYS Chatbot Arena Leaderboard, however it doesn’t have GQA atm (the final Qwen2 will…) |
2024-02-25 | 70B+ | wolfram/miquliz-120b-v2.0 | I haven’t tried out most of the 120B frankenmerges, but lots of interesting work by community members there Reddit Discussion here |
2024-01-19 | Coding | deepseek-ai/deepseek-coder-33b-instruct | There is also a v1.5 of the 7b just released, but if you have the resources the 33B is going to reason better (recent test shows even the v1.0 7b to be great) |
2024-03-21 | Coding | m-a-p/OpenCodeInterpreter-DS-33B | A DeepSeek Coder 33B fine-tune w/ code generation execution/refinement that scores higher than GPT4 CI (!) on benchmarks |
2024-03-21 | RAG/Function | CohereForAI/c4ai-command-r-v01 | Command-R 35B is a NC licensed model, but SOTA multilingual model specifically trained for RAG and tool use |
2024-03-21 |