開発/設計

IBM Steps In With 80,000 People to Plug the Holes Lovable and Cursor Left Behind. A Former Failed Engineer Walks Through the 'AI Across the Entire SDLC' Bob Announcement

After 170 Lovable vulnerabilities and Cursor wiping a production database, IBM answered the holes in vibe coding with Bob, announced 4/28. I dissect the mechanism that put agents across every SDLC phase at a scale of 80,000 people, 45%, and 30 days becoming 3, from my own dev workflow's perspective.

IBM Steps In With 80,000 People to Plug the Holes Lovable and Cursor Left Behind. A Former Failed Engineer Walks Through the 'AI Across the Entire SDLC' Bob Announcement
目次

“These numbers genuinely make me shiver.” From last month into last week, I’ve been writing a “trilogy of incidents” around vibe coding, one after another. The 170 Lovable vulnerabilities at the end of March. The Cursor production database deletion incident in early May. And yesterday’s (5/5) prescription article on “Secure Vibe Coding.”

This trilogy was set, throughout, in “what’s happening to individual developers and small teams.” That said, the stage will eventually expand to the enterprise side too. The story I’d been thinking about in the back of my head as I wrote that was, in fact, getting its answer on April 28.

Let me line up three numbers.

80,000 people. 45%. 30 days becoming 3.

These are the numbers from “Bob,” the AI development partner whose global availability IBM announced on April 28, 2026 (sources: IBM Newsroom 2026-04-28, The New Stack). The story is that the “holes in individual development” I’d been writing about in my trilogy — IBM had already started plugging them at the scale of 80,000 internal employees.

I’m a former failed engineer with a customer success background. Bob is an enterprise product, so as a side-gig engineer, I’ll almost certainly never touch it directly. Even so, my single reason for writing this article today is this: once you understand what Bob is trying to solve, you can see what to add to your own development workflow.

⚠️ The article URLs from IBM’s official press release, The New Stack, DevOps.com, Artificial Intelligence News, etc. that I reference in this article are cited as confirmed at the time of writing. For final judgment, please refer to each company’s primary source.

The Trilogy’s Final Chapter: Lovable, Cursor, and IBM Bob Complete the “Don’t Break It” Prescription

Vibe coding trilogy timeline from late March to May 2026. Four events arranged on the horizontal axis. (1) 3/31 "170 Lovable vulnerabilities (Outpost24 report)"

Tracing the timeline, here’s how it goes.

At the end of March, Outpost24 reported that 170 vulnerabilities had been found in apps built with Lovable. That’s the source material for “Vibe Coding’s 170 Back Doors,” which I wrote on April 1. The contents were nothing but “textbook-level” issues: authentication bypass, SSRF (Server-Side Request Forgery, server-side forgery attacks), Clickjacking, hardcoded encryption keys. You could see the structure where AI-written code was going into production without any human review.

In early May came the Cursor incident reported by The Register. On Cursor’s IDE, a Claude Opus agent executed destructive commands against a certain startup’s production DB. The week before, Cursor CEO Michael Truell had given a Fortune interview using the phrase “shaky foundations” — the incident came right on the heels of that. The composition is: production collapsed the week after the CEO’s warning.

And in yesterday’s (5/5) article, I covered Forrester’s “Secure Vibe Coding.” Janet Worthington’s argument boils down to one point: “You can’t make it safe after the fact — you have to bake it in from the start of the design.”

When you organize these three points, what emerges is a three-stage structure of “breaks → warning → prescription (proposal).” The Lovable incident showed “it breaks,” the Cursor incident showed “it breaks even more,” and Forrester put “design that doesn’t break” into words.

That said, Forrester’s proposal is policy-level. Even when told “build it into every phase,” there hadn’t been a real example of what to put where, implemented at scale.

That “real example implemented at scale” is IBM Bob.

IBM’s internal use started in June 2025, beginning with 100 developers and expanding to over 80,000 according to the announcement (source: The New Stack). It means that against the concept Forrester proposed in 2026, IBM had already accumulated nearly a year of internal operational data.

The order looks reversed, but structurally it’s natural to read it as “IBM moved first, and Forrester put words to it after.” The skeleton of today’s conclusion is that Bob is the scale implementation of the “prescription chapter” of the vibe coding trilogy.

What Is IBM Bob: The Three Numbers — 80,000 People, 45%, 30 Days Becoming 3

IBM Bob's three main numerical scale diagram. Three circles arranged vertically. (1) Top circle "80,000 people" + "IBM internal developers using Bob," (2) Middle circle "45%" +

Let me add context to each of the three numbers.

First: 80,000 people. This is the number of developers using Bob inside IBM. The important part is that this isn’t “80,000 people at launch.” According to the announcement, it started with 100 people in June 2025, expanded incrementally, and exceeded 80,000 as of April 2026. They ran an 800x expansion in roughly 10 months while gathering internal proof data along the way.

Second: 45%. This is published as the average productivity improvement among Bob users. I’ll note explicitly in the source section that this is IBM’s internal survey (self-reported basis), not an externally audited figure. Please confirm the final judgment of the number with the primary source.

Third: 30 days becoming 3. This is a case study where, in a Java version upgrade using Bob, the typical work period was dramatically compressed. In the announcement, it’s introduced as a case where over 160 hours of engineering work was cut. The implementer was a cloud solutions company called Blue Pearl, which appeared as an IBM customer case study (source: The New Stack).

What can you see by lining up these three?

“Bob runs at scale.” That’s it, in essence. There are countless tools that work for 100 people. But there are almost no tools that maintain a 45% productivity boost while running for 80,000 people. The reason Lovable’s vulnerability problem happened was the structure of “things go soft at the scale of individual development.” Bob’s design philosophy is the exact opposite — it reads as a design where running at scale is itself the safety mechanism.

The Decisive Difference Between “AI That Writes Code” and “AI Across the Entire SDLC”

Bob’s biggest characteristic is that it exists at a different layer from “AI that writes code” like Cursor / Copilot / Claude Code.

Comparison diagram. Left column "AI that writes code (Cursor / Copilot / Claude Code)," right column "AI across the entire SDLC (IBM Bob)." Each column has

“AI that writes code” only enters a single phase called “Coding” within the SDLC (Software Development Lifecycle). Think back on our development workflow. You hear requirements, think through design, write code, test it, deploy it, operate it. Where Cursor and Claude Code seriously intervene is only the “writing code” part.

Before and after that — “thinking through design,” “designing tests,” “assembling deployment procedures,” “running operational monitoring” — humans are ultimately stitching it all together by hand. That’s why, when an individual developer ships to production, holes like the Lovable incident remain intact. The structure was that the code could be cleanly produced, but the design judgment about whether it would safely run in production was left to humans.

IBM Bob fundamentally changes this. According to the announcement document, Bob explicitly places agents across every phase of the SDLC. Specifically, it’s written like this:

Bob embeds agentic AI across the entire SDLC—from discovery and planning through design, coding, testing, deployment, and operations—coordinating specialized role-based agents, reusable skills, and governed workflows.

The important phrase here is “specialized role-based agents.” A coding agent, a design-confirmation agent, a test-design agent, a security-review agent — each operates with an independent role and coordinates with one another (source: IBM Newsroom).

Let me explain how this differs using my own example. When I build internal business tools with Cursor + Claude Code and think “is this secure?”, I go to a separate ChatGPT tab and ask “does this code have an SSRF vulnerability?” For deployment procedures, I copy-paste from past procedure documents and adjust them. I only start thinking about operational monitoring once operations actually begin.

In short, the role-based agents are implemented as role-switching inside my own head. Bob does this with organized division of labor on the AI agent side. The meaning of “AI across the entire SDLC” is that you can offload the cognitive load in your head onto coordination between agents.

What Role-Based Agents Mean: How Does an Individual Developer’s Work Change?

I think the “role-based agent” idea Bob makes visible can also be applied at the individual developer level.

If I were to consciously create “role-based” structures in my current Cursor / Claude Code environment, it would look like this.

Designer role. The role that organizes requirements and writes the data model and API boundaries first. Always passes through this before implementation.

Implementer role. The role that takes the designer’s output and writes code. This is the heart of what’s called vibe coding.

Reviewer role. The role that re-reads the implemented code in a separate chat session (or with a different AI). The point is: don’t ask in the same session as the implementer.

Security role. The role that runs checks against common vulnerability patterns in parallel with the reviewer.

Operations role. The role responsible for assembling deployment procedures and rollback procedures alongside the writing.

In my current reality, I do all of this “by switching in my own head.” Bob’s proposal is “don’t switch — leave it to a separate agent.”

What clicks here is the connection to Anthropic’s “Skills” feature released at the end of March, and the sub-agent mechanisms each company is advancing. After seeing the 44 undisclosed features Nagi wrote about, I personally thought “agent division of labor is doable even at the individual level.” Bob is the enterprise version of that, and the philosophy is the same.

If I were to try this starting this week, here’s how it would go.

  1. Create implementation-only sessions in Cursor. Finish design in a separate ChatGPT or Claude.ai chat first, then hand it off to Cursor.
  2. Make a Custom Prompt for code review. Templatize “if this code has 3 vulnerabilities, list them.”
  3. Have AI write deployment procedures. Always end with “write the procedure for shipping this code to production” inside Cursor.

This is a mini-implementation of “role-based agents.” It’s not complete separation, but it cuts the switching that was happening in your head explicitly through prompts. This is the smallest unit for mimicking, at the individual level, the structure Bob proved out with 80,000 people.

Why a Multi-Model Strategy: The Intent Behind Mixing Claude, Mistral, and Granite

Another point you can’t miss is that Bob is a “multi-model product,” not a “single-model product.”

IBM Bob's multi-model router structure diagram. Center has a "Bob Router (task routing)" box. From there it branches in four directions, with model names shown at each branch destination: (1)

The announcement document explicitly states that Bob combines Anthropic Claude, Mistral open-source models, IBM’s in-house Granite SLM, and specialized fine-tuned models. It’s a design that routes to the optimal model per task on three axes — “accuracy,” “latency,” and “cost” (sources: Medium - Adithya Giridharan, The New Stack).

What’s interesting here is that IBM is putting out the message about this strategy: “we’re no longer pretending to compete on models alone (IBM isn’t pretending to compete on models anymore).” Even in the Medium explainer article, that context is the central theme.

In my reading, this is a declaration that “model selection is no longer the substance of the product.”

Talk like “ChatGPT 5 is out,” “Claude Opus 4.7 is out,” “Gemini 3 is out” — we individual developers tend to get jerked around too. But Bob’s design philosophy is the opposite: “Which model to use is something the router decides on its own. You don’t need to worry about it.” The reason they could lean this hard on enterprise is probably because IBM decided the contest as a system integrator, not as a model manufacturer.

What does this mean for individual development?

My current state is using “Claude Opus,” “GPT-5,” and “Gemini” by switching them in Cursor settings. I judge from my own experience which one suits each task. In short, I’m acting as a manual router myself.

If you want to mimic Bob’s philosophy, the first step is “writing down your model-switching criteria as your own rule.” Things like “design discussions go to Claude,” “code generation goes to GPT-5,” “light refactoring goes to Gemini” — put your own little internal router into writing. This is a 10-minute job, and the effect is unexpectedly large.

3 Actions a Former Failed Engineer Can Take This Week

Let me bring this discussion down to my actual development workflow. You can’t touch Bob. But the point of today is that Bob’s design philosophy can be mini-implemented at the individual level.

Let me narrow this down to three concrete actions.

Action 1: Measure your “AI presence rate” across all 7 SDLC phases

On paper or in a notes app, write 7 columns — Discovery / Planning / Design / Coding / Testing / Deployment / Operations — and write one line per phase about “what you’re delegating to AI.” In most cases, only Coding and Testing will be filled in. The blank phases become “candidates for what to incorporate next.”

Action 2: Split into at least 3 role-based chat sessions

Run an operation where you open three separate windows: a design session, an implementation session, and a review session. If you continue “design, implementation, and review” in the same session, the reviewer AI tends to get pulled in by the implementer AI and make soft judgments. This is the smallest unit for mimicking Bob’s “role-based agent” philosophy at an individual level.

Action 3: Compile model-selection rules into a one-pager for yourself

Make a one-page correspondence table of task type × model, like “design discussion = Claude,” “code generation = GPT-5,” “light completion = Cursor built-in.” This will also serve as the footing for judgment when future Cursor or Claude Code features come out.

These three can be started this week without buying Bob. As I wrote this article, I was reviewing my own workflow design and noticed there were three blank phases. Operations, Deployment, and Discovery (requirement discovery). The three I’m worst at were the blanks — a result fitting for someone in the failed-engineer camp.

Summary: The Day We Saw What Comes “Next” After Vibe Coding

  • On April 28, IBM announced global availability of “Bob,” its AI development partner
  • Internal proof data: 80,000 people, 45% productivity gain, the Java upgrade case (Blue Pearl) where 30 days became 3
  • Bob is not “AI that writes code” but “AI that places agents across every SDLC phase”
  • Two design philosophies are core: role-based agents and a multi-model router
  • It can be read as the “scale implementation of the prescription” for the trilogy of Lovable, Cursor, and Forrester’s Secure Vibe Coding
  • Even at the individual development level, the philosophy can be mimicked with three things: measuring SDLC presence rate, separating role-based sessions, and writing model-selection rules into text

“I genuinely think the era has changed.” A year and a half after vibe coding started catching on, I have the feeling that the contours of “the next phase” are finally becoming visible. The Lovable and Cursor incidents showed “it breaks,” Forrester’s Secure Vibe Coding put “design that doesn’t break” into words, and IBM Bob proved out “implementation that doesn’t break” with 80,000 people.

We individual developers and side-gig engineers can’t buy Bob. Even so, the direction of replacing the role-switching in your own head with division of labor between agents is something you can mimic right now. The moment I publish this article, I plan to review my own Cursor setup. Starting from separating the design session and the implementation session.

People with experience of failure are actually probably good at this “role-based” way of thinking. The self-awareness that “I’m too weak to do everything alone” is likely the starting point of agent division of labor. Maybe we, the failed-engineer crowd, will adapt to a Bob-like worldview faster than professional engineers — I’ll close today carrying that small bit of hope.


Sources & References

⚠️ The 45% productivity gain is IBM’s internal survey figure (self-reported basis), not a third-party audited figure. The Java upgrade case where 30 days became 3 is based on the case study of an IBM-officially-announced customer (Blue Pearl). For final judgment of each figure, please refer to the primary source.


  • 2026-04-01 “Vibe Coding’s 170 Back Doors. A Former Failed Engineer Seriously Investigates the Case Where Security Flaws Were Found in 10.3% of Lovable-Built Apps”: /en/blog/g2026040100003701/
  • 2026-04-03 “The Day Cursor’s CEO Admitted ‘Our Tool Builds Shaky Foundations.’ The Next Question Vibe Coding Faces, Posed by the Zero-Click Vulnerability CurXecute”: /en/blog/g2026040300004001/
  • 2026-05-04 “The Day Cursor’s AI Agent Erased a Production Database. The Event That Happened the Week After the CEO Warned ‘The Foundation Is Crumbling’”: /en/blog/g2026050400013501/
  • 2026-05-05 “The Blueprint to Write Before It Breaks. After Lovable’s 170 Vulnerabilities and the Cursor Production DB Incident, a Former Failed Engineer Tries Applying Forrester’s Proposed ‘Secure Vibe Coding’ to His Own Development Workflow”: /en/blog/g2026050500013801/
ゲン
Written byゲンCS × Vibe Coder

正直、一度エンジニアは諦めました。新卒で入った開発会社でバケモノみたいに優秀な人たちに囲まれて、「あ、私はこっち側じゃないな」って悟ったんです。その後はカスタマーサクセスに転向して10年。でもCursorとClaude Codeに出会って、全部変わりました。完璧なコードじゃなくていい。自分の仕事を自分で楽にするコードが書ければ、それでいいんですよ。週末はサウナで整いながら次に作るツールのこと考えてます。