Last ReviewedClass of ModelModelNotes
2024-01-283Bstabilityai/stablelm-zephyr-3b6.64 MT-Bench
Stability NC License
2024-02-257Bmlabonne/AlphaMonarch-7BCurrent endpoint of mlabonne’s merging experiments.
High performing across wide range of benchmarks
CC-BY-NC 4.0
2024-01-1813BNousResearch/Nous-Hermes-2-SOLAR-10.7BApache 2.0
Hermes tune of SOLAR (layer stacked Mistral 7B)
2024-01-184x7B MoEmlabonne/Beyonder-4x7B-v2MS Research License / CC-BY-NC 4.0
A very capable looking model that has 24.2B parameters, but inferences at ~12B (2 experts)
2024-01-1834Bjondurbin/bagel-dpo-34b-v0.2Yi License (Instant Commercial)
Reddit Discussion
Yi 200K based finetune
2024-01-188x7BNousResearch/Nous-Hermes-2-Mixtral-8x7B-DPOBest performing Mixtral? That’d make this the best performing Apache 2.0 licensed model available.
2024-01-1870B+miqudev/miqu-1-70bLeaked alpha of mistral-medium
fp16 dequant here: 152334H/miqu-1-70b-sf
See also this test of various miqu models
2024-02-2570B+abacusai/Smaug-72B-v0.1A DPOP tune of Qwen-72B (1.0) - topped the HF leaderboards and scores >80 on MMLU, however haven’t seen much real world reporting.
2024-02-2570B+Qwen/Qwen1.5-72B-ChatQwen1.5 72B is currently the highest ranked open model on the LMSYS Chatbot Arena Leaderboard, however it doesn’t have GQA atm (the final Qwen2 will…)
2024-02-2570B+wolfram/miquliz-120b-v2.0I haven’t tried out most of the 120B frankenmerges, but lots of interesting work by community members there
Reddit Discussion here
2024-01-19Codingdeepseek-ai/deepseek-coder-33b-instructThere is also a v1.5 of the 7b just released, but if you have the resources the 33B is going to reason better (recent test shows even the v1.0 7b to be great)