Speaker

Hamel Husain

Hamel Husain

Hamel Husain is an independent AI consultant and educator, co-instructor (with Shreya Shankar) of the highest-grossing course on Maven: AI Evals for Engineers and Product Managers. His background spans machine learning, data science, and AI product consulting. He advises AI product teams on error analysis and evaluation methodology.


Key ideas

  • Look at your traces first. The highest-ROI activity in AI product development. Teams consistently underdo this and consistently discover surprises.
  • Benevolent dictator. One domain expert does the open coding — not a committee. A single coherent perspective produces tractable, actionable failure mode categories.
  • Binary LLM judges. LLM-as-judge evaluators should output true/false per failure mode. Likert scale scores (1–7) are uninterpretable and force false precision.
  • Confusion matrix calibration. Agreement % alone is misleading for rare failure modes; always examine the confusion matrix before deploying a judge.
  • Evals = data science, not magic. The discipline is the same as traditional analytics applied to product. New jargon, same thinking.

Appearances

SourceDateNotes
Hamel Husain and Shreya Shankar on Evals2025Live demo of error analysis methodology; benevolent dictator; LLM judge design; confusion matrix calibration

See also