Hamel Husain
Hamel Husain is an independent AI consultant and educator, co-instructor (with Shreya Shankar) of the highest-grossing course on Maven: AI Evals for Engineers and Product Managers. His background spans machine learning, data science, and AI product consulting. He advises AI product teams on error analysis and evaluation methodology.
Key ideas
- Look at your traces first. The highest-ROI activity in AI product development. Teams consistently underdo this and consistently discover surprises.
- Benevolent dictator. One domain expert does the open coding — not a committee. A single coherent perspective produces tractable, actionable failure mode categories.
- Binary LLM judges. LLM-as-judge evaluators should output true/false per failure mode. Likert scale scores (1–7) are uninterpretable and force false precision.
- Confusion matrix calibration. Agreement % alone is misleading for rare failure modes; always examine the confusion matrix before deploying a judge.
- Evals = data science, not magic. The discipline is the same as traditional analytics applied to product. New jargon, same thinking.
Appearances
| Source | Date | Notes |
|---|---|---|
| Hamel Husain and Shreya Shankar on Evals | 2025 | Live demo of error analysis methodology; benevolent dictator; LLM judge design; confusion matrix calibration |