It started as a course on fine-tuning LLMs by Hamel Husain & Dan Becker. It then blew up to become a full conference on mastering LLMs, and in that process, I couldn’t keep up with the talks as they happened. So I finally caught up watching, several of the talks were inspiring and insightful. These were my takeaways:
From Prompt Engineering Workshop w/ John Berryman:
From Dan & Hamel in Fine-Tuning Workshop 3:
Your AI Product Needs Evals – Hamel’s Blog.
Unit tests to catch basic issues, like handling 0, 1, 2 output counts.
LLM-as-a-judge needs alignment, i.e. a mini-evaluation as well.
LLM-as-a-judge can be misleading (swap A & B, and does the boolean answer flip?).
Human evals can also be misleading (expectations can increase over time), so use A/B testing.
From Spellgrounds for Prodigious Prestidigitation by Bryan Bischof:
Evals are for:
Things to avoid: