LLMs are general. Verification is specific. The people who know what "correct" looks like should be defining the tests — not ML engineers, not prompt hackers.
Generation is solved. The constraint has moved to trust. That trust layer is built by people who've spent decades in their fields, not by more parameters.
100M+ knowledge workers spend 4.3 hours per week verifying AI outputs. The companies that help domain experts encode their expertise into scalable eval systems will capture the verification gap.
How domain expertise flows from tacit knowledge to scalable, defensible eval infrastructure.
A domain expert building evals operates across three layers, each creating compounding defensibility.
Find domains with high verification costs and clear experts. Give those experts simple tooling. Let them build the eval libraries. The library becomes the product.