AI Evaluation · Model Verification
Domain Experts as Eval Builders
LLMs are general. Verification is specific. The people who know what "correct" looks like should be defining the tests — not ML engineers, not prompt hackers.
LLMs are general. Verification is specific. The people who know what "correct" looks like should be defining the tests — not ML engineers, not prompt hackers.