https://arxiv.org/abs/2306.05685

24th December 2023

Key Highlights

Proposed Benchmarks

  1. MT-Bench:
  2. Chatbot Arena:

Insights on LLM-as-a-Judge

Bias Mitigation

Hybrid Evaluation Framework

Public Resources

Conclusion