This project is for me to learn about and improve LLM evaluation. I was working on a mathematics chatbot and tried different model evaluation libraries (LightEval Hugging Face and DeepEval Confidant AI before the 12th of December 2024), which didn’t work as expected. After this, I started reading research papers and trying different evaluation methods. I have summarised the research in this Notion and added my thoughts. There is a link to a GitHub repo where I’m trying different approaches to evaluating the mathematics chatbot.

This page contains my research, notes and thoughts on the project. If you have any thoughts or comments, message me on LinkedIn.

Github: https://github.com/minettebrink/llm_eval

LinkedIn: https://www.linkedin.com/in/minette-kaunismäki-8b138b166/

X: @MinetteKaum

Research

Thoughts on what I’ve read

Summary of what I read and thoughts