The Current State of AI Engineering

Software code with list of AI options

Part 2: The Tool Market

The market has fractured into four tiers. Each is at a different inflection point.

That framing matters because most of the coverage you'll read treats this as a horse race: who's winning, who's losing, what the latest benchmark says. The more useful question is what each tool actually represents and where it's heading.

The Enterprise Play

Contrary to popular opinion, GitHub Copilot is still leading the enterprise tier.

The JetBrains January 2026 AI Pulse survey of 10,000+ professional developers put Copilot at 29% workplace adoption overall. Ninety percent of Fortune 100 companies have deployed it.¹ Paid subscribers hit 4.7 million by January 2026, up 75% year over year.² The activation curve is remarkably steep: in Accenture's enterprise deployment data, 81.4% of developers install the extension on their first day with a license. 96% of those start accepting suggestions the same day.¹

The product strategy is clear. Issues, PRs, Actions, security scanning, and agent orchestration all converging inside a single GitHub-native surface. Custom agents via .github/agents/. Cross-agent memory. A self-reviewing coding agent. The play isn't to be the best coding assistant. It's to be the platform everything else runs on, inside an ecosystem where most enterprises already live.

GitHub announced this week that all Copilot plans are moving to usage-based billing on June 1.³ Their framing: agentic usage is becoming the default, and the old flat-rate model where a quick chat question and a multi-hour autonomous coding run cost the same wasn't sustainable. The developer community's response has been skeptical. "You will get less, but pay the same price" is the headline circulating in forums this week.⁴

Read charitably, this is honest pricing alignment with actual compute costs. Read less charitably, it signals that running frontier-model agents at enterprise scale is harder than subscriptions implied.

Meanwhile, among the developers who set the pace, the ones writing the posts that shape what next year's enterprise RFPs say, Copilot's satisfaction numbers tell a different story. The Pragmatic Engineer's February 2026 survey of 906 senior engineers found just 9% named Copilot their most-loved tool.⁵

Copilot may win the RFP. It's losing the people who decide what the next RFP should say.

The Runtime

Claude Code went from 3% workplace adoption nine months ago to 18% today. The same JetBrains survey that shows Copilot leading also shows Claude Code growing the fastest.¹ A 91% CSAT. An NPS of 54. Both the highest satisfaction scores in the category.

The Pragmatic Engineer survey put 46% of senior engineers naming Claude Code their most-loved tool. Cursor was second at 19%. Copilot was third at 9%.⁵

But the adoption numbers aren't what make Claude Code interesting. It's the architectural direction.

Claude Code isn't building a better IDE plugin. It's building the infrastructure layer that agentic workflows run on. Persistent task management. Remote headless execution. Parallel agents in isolated git worktrees. A 1 million token context window at standard pricing. The SWE-bench data shows that Claude Code scores higher on coding benchmarks than the raw Claude Opus 4.6 model it runs on.⁶ The gap is Anthropic's agent engineering: tool use patterns, retry logic, context management. Not the model weights.

The part worth sitting with: among small companies (1 to 10 people), 75% of developers use Claude Code.⁷ Among large enterprises with 10,000+ employees, GitHub Copilot leads at 56%. That split isn't about quality. It's about procurement, compliance, and existing contracts. The developers with the most freedom to choose are choosing Claude Code by a wide margin.

Claude Code is becoming the substrate other things run on. Not just the tool a developer reaches for.

The Enterprise Push

Codex's growth curve is steep and accelerating. The desktop app launched in February 2026 with 1.6 million weekly active developers. By mid-March: 2 million. By early April: 3 million. By late April: 4 million.⁸ Within ChatGPT Business and Enterprise, Codex users grew 6x between January and April.⁹

The enterprise strategy mirrors what Copilot has done with Microsoft's distribution, but through consulting partnerships instead of bundled contracts. Accenture, PwC, Infosys, Cognizant, CGI. Codex Labs puts OpenAI experts inside organizations to run workshops and move teams from early usage to repeatable deployment.⁸ They're going after exactly the kinds of large, legacy-heavy organizations that don't organically find developer tools on their own.

The case study framing from one enterprise deployment: "The primary job of our engineering team became enabling the agents to do useful work."

That's not a productivity claim. That's a definition change. When your best engineers spend most of their time building the scaffolding and context that lets agents run effectively, the role has transformed. Whether that's exciting or alarming probably depends on what kind of engineer you are.

The Plateau

Cursor sits at 18% workplace adoption, tied with Claude Code in the JetBrains data. The trajectories are not the same.

Cursor's original edge was IDE-native integration that made AI feel natural before the rest of the market caught up. That space is compressing fast. Claude Code has terminal-native execution and runtime ambitions beyond the editor. Copilot has doubled down on IDE integration across every major environment. Codex is pushing into the desktop. The differentiated ground Cursor occupied is being approached from multiple directions.

Awareness is high at 69%, but growth has slowed.¹⁰ In satisfaction, Cursor earns 19% "most loved" in the Pragmatic Engineer survey, respectable, but 2.4x behind Claude Code. The tool lacks Copilot's enterprise distribution and Claude Code's runtime ambitions. The window is narrowing.

Cursor brought AI coding to the mainstream. It may not be the one that keeps it there.

The Open Source Groundswell

The proprietary tools aren't competing with each other in a vacuum. Open source alternatives are pulling real developer adoption, and the speed is notable.

OpenCode (opencode.ai) has crossed 150K GitHub stars in under a year.¹¹ It's an MIT-licensed, terminal-first AI coding agent that also ships a desktop app and IDE extension. The differentiator is structural: it's provider-agnostic. You can point it at Claude, OpenAI, Gemini, or local models. If you already have a GitHub Copilot subscription, you can use that too. It has built-in LSP support, a client/server architecture that lets you drive it remotely, and a Plan/Build mode that gives you a review step before the agent touches your files. Over 850 contributors. 6.5 million monthly active developers.¹¹

Pi (pi.dev) takes a different approach. Created by Mario Zechner, Pi is a monorepo containing four packages: pi-ai (a unified LLM API across providers), pi-agent-core (the agent loop and tool execution), pi-coding-agent (the CLI, session management, and SDK), and pi-tui (terminal UI components).¹² Many developers use Pi as a coding harness with its terminal user interface. The design philosophy is aggressively minimal: four tools (read, write, edit, bash), a system prompt and tool definitions that together come in under 1,000 tokens, and no safety rails by default. Zechner submitted Pi to the Terminal-Bench 2.0 leaderboard with Claude Opus 4.5 against Codex, Cursor, and Windsurf.¹² The SDK exports createAgentSession() as a programmatic entry point. It's designed to be embedded, or to be adapted to your workflows.

That embedding model is exactly how OpenClaw (github.com/openclaw/openclaw) uses Pi. OpenClaw is an open-source AI assistant that routes agent sessions through messaging gateways -- WhatsApp, Telegram, Discord, Slack, iMessage. Rather than spawning Pi as a subprocess or using RPC, OpenClaw directly imports Pi's AgentSession via createAgentSession() and runs it in-process.¹³ This gives OpenClaw full control over the session lifecycle: custom tool injection, dynamic system prompts per channel and context, multi-account auth profile rotation with failover, and provider-agnostic model switching. It replaces Pi's default bash tool with its own execution layer and adds channel-specific tools for each messaging platform. Sessions persist as JSONL files with tree structure for branching.¹³

The pattern across all three is the same: open source agentic tooling isn't copying the proprietary surface. It's building modular, composable infrastructure that lets developers choose their models, their interfaces, and their workflows.

The Open-Weight Question

The most important question in the market isn't which frontier model wins.

It's whether frontier models are necessary at all.

The research is pointing toward an answer. MiRA, an RL framework using milestone-based reward signals, applied to an open Gemma3-12B model boosted its success rate from 6.4% to 43.0% on WebArena-Lite. That surpasses GPT-4-Turbo at 17.6% and GPT-4o at 13.9%, and the previous open-model state of the art at 38.4%.¹⁴ A 12-billion-parameter open model, with the right scaffolding, doesn't close the gap with frontier models. It surpasses them.

The SWE-Bench Pro data reinforces this from a different angle: the same model with a basic scaffold scores 23%. With an optimized 250-turn scaffold, it scores 45%+. That 22-point swing dwarfs the performance gap between any two frontier models.⁶

The cost picture has shifted just as fast. In December 2025, frontier-level coding performance required Opus-tier pricing. By March 2026, Gemini 3.1 Pro delivered comparable benchmark scores at less than half the cost, and MiniMax M2.5 at roughly 1/25th.⁶ Open-weight models are being deployed inside real engineering pipelines at real companies, not as experiments, but as production choices.¹⁵

If a 12B open model with the right scaffolding can outperform a generic frontier model at a fraction of the cost, the cost equation changes fundamentally. Tools built on open-weight models, customized to your codebase, your conventions, your deployment process, become compelling not just on price but on performance.

The big question for the next 12 months: does the industry keep pushing the frontier, or do open-weight models with superior scaffolding make the frontier irrelevant for most coding tasks?

Next: what the research actually says about scaffolding, and why a directory full of Markdown files might be the most important thing in AI engineering right now.

Sources

¹ JetBrains AI Pulse Survey, January 2026. n=10,000+ professional developers across 8 languages. https://blog.jetbrains.com/research/2026/04/which-ai-coding-tools-do-developers-actually-use-at-work/

² GitHub Copilot Statistics 2026. https://www.getpanto.ai/blog/github-copilot-statistics

³ GitHub Blog: "GitHub Copilot is moving to usage-based billing." April 2026. https://github.blog/news-insights/company-news/github-copilot-is-moving-to-usage-based-billing/

⁴ Visual Studio Magazine, April 27, 2026. https://visualstudiomagazine.com/articles/2026/04/27/devs-sound-off-on-usage-based-copilot-pricing-change-you-will-get-less-but-pay-the-same-price.aspx

⁵ The Pragmatic Engineer, AI Tooling Survey 2026. n=906 senior engineers, January–February 2026. https://newsletter.pragmaticengineer.com/p/ai-tooling-2026

⁶ Best AI for Coding (2026): Every Model Ranked by Real Benchmarks. Morph LLM. https://www.morphllm.com/best-ai-model-for-coding

⁷ Vinod Sharma, "Claude Code Is the Most-Loved Developer Tool in 2026." March 2026. https://vinodsharma.co/blog/claude-code-most-loved-developer-tool-2026

⁸ OpenAI, "Scaling Codex to enterprises worldwide." April 2026. https://openai.com/index/scaling-codex-to-enterprises-worldwide/

⁹ The Next Web, "OpenAI recruits Cognizant and CGI to take Codex into enterprise software shops worldwide." April 2026. https://thenextweb.com/news/openai-codex-enterprise-partners-cognizant-cgi

¹⁰ Cursor AI Statistics 2026. Panto AI. https://www.getpanto.ai/blog/cursor-ai-statistics

¹¹ OpenCode. https://opencode.ai / https://github.com/anomalyco/opencode

¹² Mario Zechner, "What I learned building an opinionated and minimal coding agent." November 2025. https://mariozechner.at/posts/2025-11-30-pi-coding-agent/

¹³ OpenClaw, "Pi Integration Architecture." https://github.com/openclaw/openclaw

¹⁴ MiRA: "A Subgoal-driven Framework for Improving Long-Horizon LLM Agents." arXiv:2603.19685. https://arxiv.org/pdf/2603.19685

¹⁵ MindStudio, "The Best Open-Source LLMs for Agentic Coding in 2026." April 2026. https://www.mindstudio.ai/blog/best-open-source-llms-agentic-coding-2026