Hamel Husain

Hamel Husain is an independent AI consultant and educator, co-instructor (with Shreya Shankar) of the highest-grossing course on Maven: AI Evals for Engineers and Product Managers. His background spans machine learning, data science, and AI product consulting. He advises AI product teams on error analysis and evaluation methodology.

Key ideas

Look at your traces first. The highest-ROI activity in AI product development. Teams consistently underdo this and consistently discover surprises.
Benevolent dictator. One domain expert does the open coding — not a committee. A single coherent perspective produces tractable, actionable failure mode categories.
Binary LLM judges. LLM-as-judge evaluators should output true/false per failure mode. Likert scale scores (1–7) are uninterpretable and force false precision.
Confusion matrix calibration. Agreement % alone is misleading for rare failure modes; always examine the confusion matrix before deploying a judge.
Evals = data science, not magic. The discipline is the same as traditional analytics applied to product. New jargon, same thinking.

Appearances

Source	Date	Notes
Hamel Husain and Shreya Shankar on Evals	2025	Live demo of error analysis methodology; benevolent dictator; LLM judge design; confusion matrix calibration

Hamel Husain

Key ideas

Appearances

See also