Repo: https://github.com/AUGMXNT/llm-judge/
Versioned Tests
- MT-Bench
- JA MT-Bench
- Rakuda
- ELYZA 100
- LightBlue Tasks
- Shisa Tasks
Separate Folders Config YAML Store Run Metadata
- config Output
Repo: https://github.com/AUGMXNT/llm-judge/
Versioned Tests
Separate Folders Config YAML Store Run Metadata