Agent Failures Don't Start Where They Appear
●Mar 12, 2026
In long-running agent systems, the visible failure is usually downstream from the step where the run first became unrecoverable.
Writing
Architecture, reliability, and the discipline behind production systems.
Search
Filter by tag
●Mar 12, 2026
In long-running agent systems, the visible failure is usually downstream from the step where the run first became unrecoverable.
●Mar 4, 2026
Model internals can explain tokens, but agent failures are timeline failures that require causal execution history.
●Feb 26, 2026
Autonomous systems need version control for cognition: immutable traces, deterministic replay, and causal failure localization.
●Feb 23, 2026
Agents need a kernel: a runtime that treats state, failure, recovery, and observability as first-class concerns.
●Jan 26, 2026
I have 16,381 lines in my personal journal. Not because I'm particularly interesting. Because I'm particularly forgetful.
●Jan 10, 2026
A practical look at the first failure modes in growing React systems and how to avoid expensive cleanup cycles.
●Dec 1, 2025
How to structure frontend tests so teams ship fast while maintaining confidence in high-risk workflows.
●Sep 2, 2025
Architecture choices in frontend systems accumulate interest over time, for better or worse.
●May 14, 2025
Lessons from switching between fast startup loops and enterprise constraints without losing execution quality.