Agentic Coding Insights

COMPLETED March 15, 2026
Summary

Briefing: Agentic Coding Insights

Purpose: I'm a developer at a startup who wants to stay up to date with the latest best practices for agentic coding tools. Our current stack is FastAPI Python with a basic JS frontend. We primarily use Claude Code which works well but struggles with some of our frontend requirements. I'm interested in Claude Code best practices, compact effectiveness optimization, frontend performance strategies, comparisons with Codex and Gemini, and when developers prefer IDEs over terminal-based approaches.

Key Insights

  • The single highest-leverage Claude Code habit is five tokens: "use red-green TDD." Simon Willison reports that starting every coding session by telling the agent how to run tests and instructing it to use red-green TDD dramatically improves output quality. He goes further with a "conformance-driven development" technique: when adding file uploads to his framework, he had Claude build a test suite that passes across Go, Node.js, Django, and Starlette, then used that generated suite as the specification for his own implementation. For your FastAPI backend, this means investing in comprehensive pytest suites before delegating work to agents, and having Claude curl the running server after implementation to catch bugs automated tests miss. Tests are now "effectively free" with agents—Willison argues they're no longer remotely optional.
  • My fireside chat about agentic engineering at the Pragmatic Summit

  • Claude Code's frontend weakness is structural, not a gap that closes with the next model release. Multiple independent sources converge on the same diagnosis: AI models excel at backend coding because it's objective and functional, but frontend work involves subjective human judgment about usability, aesthetics, and responsiveness that models fundamentally struggle with. SharpTech's analysis notes that frontend code "can function and pass a unit test but it still sucks" from a usability standpoint, and AINews reports specific difficulties with CSS precision. For your JS frontend, this means you should provide extremely detailed design references, component libraries, and explicit CSS specifications in your Claude.md file rather than expecting the agent to make good aesthetic judgments independently. Iterative prompting ("do better" on visual output) can help, but frontend work requires significantly more human steering than backend delegation.

  • Nerding Out with the Neo, Claude and the Integration Question, The End of Coding Language History
  • Sorting algorithms

  • The emerging best practice is to use different models for different stages of your workflow, not to be tool-loyal. The Compound Engineering methodology recommends Claude Haiku or Gemini Flash for brainstorming, Opus for planning, Codex for implementation, and Gemini for code review. Samuel Colvin (Pydantic creator) provides vivid characterizations: Claude Code is "Captain America"—competent and reliable; Codex is "Q from Bond"—neurotic and detail-focused; Gemini is "the Joker"—capable of incredible work but prone to deleting your files. Critically, Gemini reviews PRs in about 90 seconds versus Codex's potential 30 minutes, making it a powerful quick-feedback tool. For your startup, this suggests experimenting with Gemini CLI for fast PR reviews while keeping Claude Code as your primary implementation tool for FastAPI work.

  • Compound Engineering Camp: Every Step, From Scratch
  • ⚡️Monty: the ultrafast Python interpreter by Agents for Agents — Samuel Colvin, Pydantic

  • Run /compound before compaction—this timing insight alone could transform your long Claude Code sessions. The Compound Engineering methodology introduces a four-step loop (Plan, Work, Review, Compound) where the critical fourth step captures lessons learned and stores them as discoverable artifacts with metadata. The key operational insight is that you must run this capture step while context is fresh: "You don't want to run it after compaction," because the AI will have already forgotten earlier details. Claude Code's built-in memory system using .md files was benchmarked as the worst-performing among tested approaches, underperforming by almost 50% compared to dynamic graph-based alternatives. This means you should structure your Claude.md files with the most critical information first (agents often only read the first N lines), and actively compound lessons before context degrades.

  • Compound Engineering Camp: Every Step, From Scratch
  • OpenClaw's Memory Sucks and the fix is simple — Dhravya Shah, Supermemory

  • Sandboxing is the most important safety practice, and "Claude Code for the web" offers a pragmatic middle ground. Willison admits to running Claude with dangerously skip permissions on his Mac despite being "the world's foremost expert on why you shouldn't do that," because the convenience is compelling. His mitigation: avoid dumping untrusted instructions into the agent in that mode, and prefer "Claude Code for the web" which runs in an Anthropic-managed container—limiting worst-case damage to source code theft. The Pi Day speaker independently arrives at the same conclusion, advocating Docker containerization and noting that permission pop-ups create fatigue that leads users to bypass them entirely. For a startup handling real user data, running agents in containers and mounting only necessary data is the most practical security posture.

  • My fireside chat about agentic engineering at the Pragmatic Summit
  • Pi Day: AMA with Pi's Creator + Talks & Extensions Deep Dive

  • Rewrite your AI harness every six months—the landscape changes too fast for set-and-forget tooling. Notion co-founder Simon Last emphasizes that companies get in trouble by doing "one thing and then just sticking with it," insisting you need keen awareness of current model capabilities and must redesign your harness, system, and product deeply around them. He's had a coding agent running continuously for 13 days working through tasks, and notes that no PR ships without full agent testing. His current setup is simply Claude Code or Codex CLI—he finds CLI tools "super simple" and effective. The key shift he describes is from coder to "agent manager": defining changes, specifying how they can be verified, and enlisting agents to execute, rather than writing code directly.

  • From Coder to Manager: Navigating the Shift to Agentic Engineering with Notion Co-Founder Simon Last

  • Stop Hooks and Memory Files are underutilized Claude Code features that significantly improve workflow. Stop Hooks automate follow-up actions after a task completes, while Memory Files provide Claude with persistent context across complex sessions. Meanwhile, the hooks-based memory approach (dynamically injecting under 2,000 tokens of relevant context per turn from a knowledge graph) outperforms static .md file memory by handling updates, contradiction resolution, and temporal reasoning. Sentry's approach to agent-optimized documentation—serving true markdown, stripping browser elements, optimizing link hierarchy—provides a template for making your project docs more agent-consumable. For your startup, structuring documentation specifically for agent consumption (placing critical information first, using markdown, minimizing noise) is a low-effort, high-impact optimization.

  • [AINews] The high-return activity of raising your aspirations for LLMs
  • OpenClaw's Memory Sucks and the fix is simple — Dhravya Shah, Supermemory
  • Optimizing Content for Agents

  • Claude's 1M token context window at standard pricing is a concrete competitive advantage for large codebase work. Anthropic now offers the full 1M context window for Opus 4.6 and Sonnet 4.6 without a long-context premium, while OpenAI charges more above 272,000 tokens for GPT-5.4 and Gemini charges more above 200,000 tokens for Gemini 3.1 Pro. However, context window size alone doesn't solve the problem—"context rot" degrades model performance as context grows, which is why conscious context management (session branching, summarization, strategic compaction timing) matters more than raw window size. For cost-conscious startups processing large codebases, this pricing structure makes Claude the more economical choice for extended sessions.

  • 1M context is now generally available for Opus 4.6 and Sonnet 4.6
  • Pi Day: AMA with Pi's Creator + Talks & Extensions Deep Dive

Emerging Patterns

Dissenting Views

  • Claude Code is either a reliable workhorse you can "oneshot basically everything" with, or a frustratingly imprecise tool that lies about production readiness—depending on who you ask. Simon Willison reports deep trust in Claude Code's predictability, stating he doesn't even question whether tasks will succeed anymore. In sharp contrast, the Pi Day speaker found Claude Code "overly optimistic," sometimes claiming code was production-ready when it immediately crashed, requiring constant repetition of instructions and feeling "not precise enough" for desired outcomes. This drove them to build their own agent with Pi. The likely explanation is that experience level and workflow sophistication mediate the experience: Willison's TDD-first, test-everything approach may prevent the reliability issues the Pi Day speaker encountered through less structured usage. For your team, this suggests that Claude Code's reliability is largely a function of how much verification infrastructure you build around it.
  • My fireside chat about agentic engineering at the Pragmatic Summit
  • Pi Day: AMA with Pi's Creator + Talks & Extensions Deep Dive

  • AI is either making software quality dramatically better or noticeably worse—and both claims have evidence behind them. One speaker argues that AI agents will make high-quality software the baseline by handling testing, documentation, security review, and polish at scale, transforming quality from a premium into table stakes. But the State of Agentic Coding hosts report the opposite observation in practice: "the quality of code is going down," with software becoming more memory/CPU intensive and exhibiting unexpected behaviors. This tension likely reflects the difference between what's possible with disciplined agentic workflows versus what actually happens when developers use these tools without sufficient verification. For your startup, this reinforces that the test-first, review-everything approach isn't just best practice—it's the defense against your own code quality degrading as you accelerate output.

  • AI Made Every Company 10x More Productive. The Ones Cutting Headcount Are Telling on Themselves.
  • State of Agentic Coding #4 with Armin and Ben

Read & Act

What to read:

What to do:

  • Add TDD instructions to every Claude Code session and build conformance test suites for your FastAPI endpoints. Start each session with "here's how to run the tests" and "use red-green TDD." Go further by having Claude generate test suites that pass against reference implementations of patterns you use (e.g., authentication flows, file uploads, pagination). Then have the agent implement against those tests. Follow up by instructing the agent to start the server and use curl to exercise the API it just created—this catches bugs that unit tests miss.

  • Split your workflow: Claude Code CLI for FastAPI backend, an IDE-based tool like Cursor for JS frontend work. Your frontend struggles likely stem from the structural mismatch between how AI processes logic and what frontend work demands. For frontend tasks, provide detailed design references, explicit CSS specifications, and component-level test suites in your Claude.md. Consider evaluating Cursor specifically for frontend work where visual feedback and live preview compensate for the agent's aesthetic blind spots.

  • Restructure your Claude.md with the most critical information first and begin compounding lessons before context degrades. Agents often only read the first N lines of context files—front-load your architectural decisions, coding conventions, and test commands. After completing a significant feature or debugging session, immediately capture what worked and what didn't before running compaction. If you're finding Claude Code "forgets" decisions, investigate hooks-based memory supplements or at minimum, create dedicated per-feature markdown files that the agent can reference.