MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
- 2024-01-08
- https://arxiv.org/abs/2401.04081
- https://www.reddit.com/r/LocalLLaMA/comments/1924pyy/mixtral_8x7b_paper_published/
2023-02 Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints https://arxiv.org/abs/2212.05055