Notes from shipping and evaluating agents.

Sandeep B./Published May 2026/7 minute read

What Is an Agent Harness?

A practical explanation of what an agent harness is, why the loop matters, and which infrastructure layers make modern AI agents usable in production.

Sandeep B./Published April 2026/3 minute read

A Prompt Line That Broke Claude

One prompt line caused a silent production regression—even Anthropic's best eval infrastructure didn't catch it before it shipped.

Halios Labs Case Study/Published March 2026/10 minute read

How We Improved an AI Sales Agent by 47% Using Structured Evaluation

A practical engineering case study on structured evaluation, measurable prompt optimization, and the silent data loss we almost shipped.

PreviousPage 1 of 1Next