Notes from shipping and evaluating agents.
Sandeep B./Published May 2026/7 minute read
What Is an Agent Harness?
A practical explanation of what an agent harness is, why the loop matters, and which infrastructure layers make modern AI agents usable in production.
Sandeep B./Published April 2026/3 minute read
A Prompt Line That Broke Claude
One prompt line caused a silent production regression—even Anthropic's best eval infrastructure didn't catch it before it shipped.
Halios Labs Case Study/Published March 2026/10 minute read
How We Improved an AI Sales Agent by 47% Using Structured Evaluation
A practical engineering case study on structured evaluation, measurable prompt optimization, and the silent data loss we almost shipped.
PreviousPage 1 of 1Next