Repo: https://github.com/AUGMXNT/llm-judge/

Versioned Tests

  • MT-Bench
  • JA MT-Bench
  • Rakuda
  • ELYZA 100
  • LightBlue Tasks
  • Shisa Tasks

Separate Folders Config YAML Store Run Metadata

  • config Output