Agentic Coding Insights

COMPLETED March 15, 2026

Summary

Briefing: Agentic Coding Insights

Purpose: I'm a developer at a startup who wants to stay up to date with the latest best practices for agentic coding tools. Our current stack is FastAPI Python with a basic JS frontend. We primarily use Claude Code which works well but struggles with some of our frontend requirements. I'm interested in Claude Code best practices, compact effectiveness optimization, frontend performance strategies, comparisons with Codex and Gemini, and when developers prefer IDEs over terminal-based approaches.

Key Insights

The single highest-leverage Claude Code habit is five tokens: "use red-green TDD." Simon Willison reports that starting every coding session by telling the agent how to run tests and instructing it to use red-green TDD dramatically improves output quality. He goes further with a "conformance-driven development" technique: when adding file uploads to his framework, he had Claude build a test suite that passes across Go, Node.js, Django, and Starlette, then used that generated suite as the specification for his own implementation. For your FastAPI backend, this means investing in comprehensive pytest suites before delegating work to agents, and having Claude curl the running server after implementation to catch bugs automated tests miss. Tests are now "effectively free" with agents—Willison argues they're no longer remotely optional.
My fireside chat about agentic engineering at the Pragmatic Summit
Claude Code's frontend weakness is structural, not a gap that closes with the next model release. Multiple independent sources converge on the same diagnosis: AI models excel at backend coding because it's objective and functional, but frontend work involves subjective human judgment about usability, aesthetics, and responsiveness that models fundamentally struggle with. SharpTech's analysis notes that frontend code "can function and pass a unit test but it still sucks" from a usability standpoint, and AINews reports specific difficulties with CSS precision. For your JS frontend, this means you should provide extremely detailed design references, component libraries, and explicit CSS specifications in your Claude.md file rather than expecting the agent to make good aesthetic judgments independently. Iterative prompting ("do better" on visual output) can help, but frontend work requires significantly more human steering than backend delegation.
Nerding Out with the Neo, Claude and the Integration Question, The End of Coding Language History
Sorting algorithms
The emerging best practice is to use different models for different stages of your workflow, not to be tool-loyal. The Compound Engineering methodology recommends Claude Haiku or Gemini Flash for brainstorming, Opus for planning, Codex for implementation, and Gemini for code review. Samuel Colvin (Pydantic creator) provides vivid characterizations: Claude Code is "Captain America"—competent and reliable; Codex is "Q from Bond"—neurotic and detail-focused; Gemini is "the Joker"—capable of incredible work but prone to deleting your files. Critically, Gemini reviews PRs in about 90 seconds versus Codex's potential 30 minutes, making it a powerful quick-feedback tool. For your startup, this suggests experimenting with Gemini CLI for fast PR reviews while keeping Claude Code as your primary implementation tool for FastAPI work.
Compound Engineering Camp: Every Step, From Scratch
⚡️Monty: the ultrafast Python interpreter by Agents for Agents — Samuel Colvin, Pydantic
Run /compound before compaction—this timing insight alone could transform your long Claude Code sessions. The Compound Engineering methodology introduces a four-step loop (Plan, Work, Review, Compound) where the critical fourth step captures lessons learned and stores them as discoverable artifacts with metadata. The key operational insight is that you must run this capture step while context is fresh: "You don't want to run it after compaction," because the AI will have already forgotten earlier details. Claude Code's built-in memory system using .md files was benchmarked as the worst-performing among tested approaches, underperforming by almost 50% compared to dynamic graph-based alternatives. This means you should structure your Claude.md files with the most critical information first (agents often only read the first N lines), and actively compound lessons before context degrades.
Compound Engineering Camp: Every Step, From Scratch
OpenClaw's Memory Sucks and the fix is simple — Dhravya Shah, Supermemory
Sandboxing is the most important safety practice, and "Claude Code for the web" offers a pragmatic middle ground. Willison admits to running Claude with dangerously skip permissions on his Mac despite being "the world's foremost expert on why you shouldn't do that," because the convenience is compelling. His mitigation: avoid dumping untrusted instructions into the agent in that mode, and prefer "Claude Code for the web" which runs in an Anthropic-managed container—limiting worst-case damage to source code theft. The Pi Day speaker independently arrives at the same conclusion, advocating Docker containerization and noting that permission pop-ups create fatigue that leads users to bypass them entirely. For a startup handling real user data, running agents in containers and mounting only necessary data is the most practical security posture.
My fireside chat about agentic engineering at the Pragmatic Summit
Pi Day: AMA with Pi's Creator + Talks & Extensions Deep Dive
Rewrite your AI harness every six months—the landscape changes too fast for set-and-forget tooling. Notion co-founder Simon Last emphasizes that companies get in trouble by doing "one thing and then just sticking with it," insisting you need keen awareness of current model capabilities and must redesign your harness, system, and product deeply around them. He's had a coding agent running continuously for 13 days working through tasks, and notes that no PR ships without full agent testing. His current setup is simply Claude Code or Codex CLI—he finds CLI tools "super simple" and effective. The key shift he describes is from coder to "agent manager": defining changes, specifying how they can be verified, and enlisting agents to execute, rather than writing code directly.
From Coder to Manager: Navigating the Shift to Agentic Engineering with Notion Co-Founder Simon Last
Stop Hooks and Memory Files are underutilized Claude Code features that significantly improve workflow. Stop Hooks automate follow-up actions after a task completes, while Memory Files provide Claude with persistent context across complex sessions. Meanwhile, the hooks-based memory approach (dynamically injecting under 2,000 tokens of relevant context per turn from a knowledge graph) outperforms static .md file memory by handling updates, contradiction resolution, and temporal reasoning. Sentry's approach to agent-optimized documentation—serving true markdown, stripping browser elements, optimizing link hierarchy—provides a template for making your project docs more agent-consumable. For your startup, structuring documentation specifically for agent consumption (placing critical information first, using markdown, minimizing noise) is a low-effort, high-impact optimization.
[AINews] The high-return activity of raising your aspirations for LLMs
OpenClaw's Memory Sucks and the fix is simple — Dhravya Shah, Supermemory
Optimizing Content for Agents
Claude's 1M token context window at standard pricing is a concrete competitive advantage for large codebase work. Anthropic now offers the full 1M context window for Opus 4.6 and Sonnet 4.6 without a long-context premium, while OpenAI charges more above 272,000 tokens for GPT-5.4 and Gemini charges more above 200,000 tokens for Gemini 3.1 Pro. However, context window size alone doesn't solve the problem—"context rot" degrades model performance as context grows, which is why conscious context management (session branching, summarization, strategic compaction timing) matters more than raw window size. For cost-conscious startups processing large codebases, this pricing structure makes Claude the more economical choice for extended sessions.
1M context is now generally available for Opus 4.6 and Sonnet 4.6
Pi Day: AMA with Pi's Creator + Talks & Extensions Deep Dive

Emerging Patterns

The quality of the "harness" matters more than raw model intelligence, and harness instability is becoming a real pain point. Multiple sources emphasize that the control layer wrapping the LLM—the system prompt, tool integrations, and workflow orchestration—is now the primary differentiator in agentic coding, not the underlying model. SharpTech describes how model, RL environment, and harness are all improving simultaneously, while the Pi Day speaker reports that Claude Code's harness changes (not model changes) broke their workflows, altered model behavior, and felt like being "gaslit by your tools." Simon Last's recommendation to rewrite the harness every six months reflects the same reality. This suggests that investing time in understanding and customizing your agent's control layer—whether through Claude.md, custom hooks, or even building on extensible platforms—yields more durable returns than chasing the latest model release.
Pi Day: AMA with Pi's Creator + Talks & Extensions Deep Dive
From Coder to Manager: Navigating the Shift to Agentic Engineering with Notion Co-Founder Simon Last
Nerding Out with the Neo, Claude and the Integration Question, The End of Coding Language History
The terminal-vs-IDE debate is converging on "use both, depending on the task." Power users like Simon Last are comfortable with CLI tools and find them "super simple," while the Pi Day speaker's journey shows that Cursor's IDE integration was the "magic moment" that made agentic coding click—they initially refused to leave their IDE for the terminal. Meanwhile, the State of Agentic Coding speakers admit to opening their IDE "the least I ever have," and Colvin complains that full-TUI agents "get in the way of my scroll." The emerging consensus isn't that one interface wins, but that backend/infrastructure work flows better in terminal agents while frontend and visual work benefits from IDE integration with live preview and file navigation. For your stack specifically, continuing with Claude Code CLI for FastAPI work while evaluating Cursor or similar for JS frontend tasks may be the optimal split.
Pi Day: AMA with Pi's Creator + Talks & Extensions Deep Dive
State of Agentic Coding #4 with Armin and Ben
From Coder to Manager: Navigating the Shift to Agentic Engineering with Notion Co-Founder Simon Last
⚡️Monty: the ultrafast Python interpreter by Agents for Agents — Samuel Colvin, Pydantic
Opus 4.6 may have traded speed for thoroughness in response to Codex's perceived precision advantage, and this trade-off matters for daily workflow. Colvin reports situations where he could complete tasks faster than the agent because Opus 4.6 now "thinks harder" and investigates longer—a behavioral shift he attributes to Anthropic reacting to feedback that Codex was more precise. Meanwhile, multiple practitioners note that all current top models are now "all really good," with the primary challenge shifting from model capability to managing your own expanded productivity. The State of Agentic Coding speakers express near-indifference to incremental model updates, saying "they're all good" and "I have no opinions on it." This suggests that optimizing your workflow, prompting strategy, and harness matters more than tracking every point release.
⚡️Monty: the ultrafast Python interpreter by Agents for Agents — Samuel Colvin, Pydantic
State of Agentic Coding #4 with Armin and Ben

Dissenting Views

Claude Code is either a reliable workhorse you can "oneshot basically everything" with, or a frustratingly imprecise tool that lies about production readiness—depending on who you ask. Simon Willison reports deep trust in Claude Code's predictability, stating he doesn't even question whether tasks will succeed anymore. In sharp contrast, the Pi Day speaker found Claude Code "overly optimistic," sometimes claiming code was production-ready when it immediately crashed, requiring constant repetition of instructions and feeling "not precise enough" for desired outcomes. This drove them to build their own agent with Pi. The likely explanation is that experience level and workflow sophistication mediate the experience: Willison's TDD-first, test-everything approach may prevent the reliability issues the Pi Day speaker encountered through less structured usage. For your team, this suggests that Claude Code's reliability is largely a function of how much verification infrastructure you build around it.
My fireside chat about agentic engineering at the Pragmatic Summit
Pi Day: AMA with Pi's Creator + Talks & Extensions Deep Dive
AI is either making software quality dramatically better or noticeably worse—and both claims have evidence behind them. One speaker argues that AI agents will make high-quality software the baseline by handling testing, documentation, security review, and polish at scale, transforming quality from a premium into table stakes. But the State of Agentic Coding hosts report the opposite observation in practice: "the quality of code is going down," with software becoming more memory/CPU intensive and exhibiting unexpected behaviors. This tension likely reflects the difference between what's possible with disciplined agentic workflows versus what actually happens when developers use these tools without sufficient verification. For your startup, this reinforces that the test-first, review-everything approach isn't just best practice—it's the defense against your own code quality degrading as you accelerate output.
AI Made Every Company 10x More Productive. The Ones Cutting Headcount Are Telling on Themselves.
State of Agentic Coding #4 with Armin and Ben

Read & Act

What to read:

My fireside chat about agentic engineering at the Pragmatic Summit — The single most actionable source for Claude Code users. Contains concrete techniques (TDD prompting, conformance-driven development, manual curl testing, sandboxing with Claude Code for the web) that you can apply to your FastAPI+JS workflow immediately.
Compound Engineering Camp: Every Step, From Scratch — Presents a complete, repeatable Plan/Work/Review/Compound methodology with specific model selection recommendations. The /compound timing insight and the multi-model workflow advice are directly relevant to optimizing your Claude Code sessions.
⚡️Monty: the ultrafast Python interpreter by Agents for Agents — Samuel Colvin, Pydantic — The most vivid and technically grounded comparative assessment of Claude Code, Codex, and Gemini from the Pydantic creator. His insights on review speed trade-offs and Opus 4.6's behavioral shifts can't be fully captured in summary.
Pi Day: AMA with Pi's Creator + Talks & Extensions Deep Dive — Worth reading for the detailed critique of Claude Code's limitations and the alternative workflow built around Pi's extensibility. Even if you don't switch tools, the context management techniques and "warm context window" strategy are directly transferable to your current setup.
OpenClaw's Memory Sucks and the fix is simple — Dhravya Shah, Supermemory — Essential if you're hitting context limits or finding Claude Code forgets decisions mid-session. The analysis of why file-based memory fails and the benchmark data showing Claude Code's memory as worst-performing provide both diagnosis and direction.

What to do:

Add TDD instructions to every Claude Code session and build conformance test suites for your FastAPI endpoints. Start each session with "here's how to run the tests" and "use red-green TDD." Go further by having Claude generate test suites that pass against reference implementations of patterns you use (e.g., authentication flows, file uploads, pagination). Then have the agent implement against those tests. Follow up by instructing the agent to start the server and use curl to exercise the API it just created—this catches bugs that unit tests miss.
Split your workflow: Claude Code CLI for FastAPI backend, an IDE-based tool like Cursor for JS frontend work. Your frontend struggles likely stem from the structural mismatch between how AI processes logic and what frontend work demands. For frontend tasks, provide detailed design references, explicit CSS specifications, and component-level test suites in your Claude.md. Consider evaluating Cursor specifically for frontend work where visual feedback and live preview compensate for the agent's aesthetic blind spots.
Restructure your Claude.md with the most critical information first and begin compounding lessons before context degrades. Agents often only read the first N lines of context files—front-load your architectural decisions, coding conventions, and test commands. After completing a significant feature or debugging session, immediately capture what worked and what didn't before running compaction. If you're finding Claude Code "forgets" decisions, investigate hooks-based memory supplements or at minimum, create dedicated per-feature markdown files that the agent can reference.