When Should You Graduate Your AI Agent Experiments? Conway × BCG Data Reveals the Decision Criteria for Going 'Always-On'
'I tried AI agents, but they just didn't stick.'
“I tried AI agents, but they just didn’t stick.”
I hear this all the time from people around me. They’ve asked ChatGPT questions. They’ve had Claude draft text. They got that far. But the number of people who’ve moved on to “keep it running all the time” is still very small.
What I find interesting is the real nature of the wall between “tried it” and “still using it.” Most people don’t stumble on AI’s capabilities. They stumble because there’s no clear standard for “what should I delegate, and how far?”
In April 2026, the push to tear down this wall is accelerating fast. Anthropic has begun the trial rollout of Conway, an always-on agent platform. BCG’s research also signals that corporate AI investment is set to double, with the spend concentrating in the agent space.
“I’ll get to it someday” is becoming too late.
In this article, I’ll lay out the criteria for switching AI agents from “experiment” to “production.” Use the framework I call the “Experiment Graduation Line” to check where your business currently stands.
”On-Call AI” and “Always-Running AI” Are Fundamentally Different
First, here’s a point I want you to internalize.
What most people use—“asking ChatGPT,” “having Claude help”—is on-call AI. It only moves when you give it instructions. Convenient, yes, but ultimately an extension of “a tool.” Once the work is done, the AI stops and waits for the next command.
What Conway showcases is an entirely different concept. CI (continuous integration—the system that automatically tests code) results, Slack messages, system monitoring alerts. The design aims for agents that react to these triggers and keep running automatically (TechBriefly, April 3, 2026).

Let me explain with a concrete example. Say you operate a web service. At 2 a.m., server response times start degrading. Traditionally, you’d notice the next morning after arriving at work and then deal with it.
In a world with always-on agents, the moment the response slowdown is detected, the agent analyzes the logs and lists three candidate root causes. It sends a Slack notification, and if the urgency is high, it even executes the initial response automatically. By the time you wake up, the report is waiting for you. That’s the worldview.
I run autonomous agent systems myself every day. Updating documents, aggregating research, checking article quality. Five or more agents collaborate asynchronously, and work progresses while I sleep. Once you experience this, you can’t go back to “on-call AI.”
Conway is still in trial, but this concept of “always-on” is finally reaching mainstream companies. That’s the most important shift of April 2026.
The Numbers Behind the “Agent Productionization” Wave
“Is it really spreading that fast?” you might wonder. Let’s check the numbers.
Gartner’s forecast is unambiguous. 40% of enterprise applications are planning to embed dedicated AI agents. Since fewer than 5% have implemented it today, that translates to an 8x jump over the next 1–2 years.
Pay attention to the gap between “planning” and “implemented.” 40% are planning, but only 5% have implemented. That remaining 35% of companies are exactly the ones about to make a move. Even if you start today, you can still join the early-mover side.
Don’t dismiss these numbers as “not relevant to me.” The reasons big companies embed agents apply directly to small companies too. “We don’t have enough people.” “Repetitive work is eating our time.” “We can’t cover nights and weekends.” Regardless of size, you’re dealing with the same problems.
Looking at the Global 2000 (the world’s 2,000 largest enterprises), 72% have already moved AI agents into “production use” (Reinventing.ai, March 16, 2026). Large enterprises are already moving. The question is when the “next tier” follows.
Let’s also look at market size. The market for dedicated agent software is $11.8 billion in 2026. The forecast for 2034 is $139 billion. The compound annual growth rate is 40.5%—roughly 12x over eight years (Joget/Gartner).
Global VC investment in Q1 2026 hit a record $300 billion, with 80% of it concentrated in AI companies (Crunchbase/TechCrunch). It’s obvious where the money is flowing—into “AI,” and within that, into “agents.”
BCG’s research suggests companies will double AI investment in 2026, with over 30% of that flowing to the AI agent space.
※ The BCG data is based on coverage via Web Tantosha Forum. The URL for BCG’s official report could not be directly verified at the time of writing, so please treat the figures as a reference.

Japan is moving too. SoftBank is rolling out AGENTIC STAR, a corporate AI agent platform (SoftBank official, December 11, 2025). ChatSense, an enterprise AI service, has also launched GPT-5.4-compatible agent features (Knowledge Sense, PR TIMES). Several companies now offer Claude Code adoption support plans. Your options are clearly expanding.
Three Conditions for Crossing the “Experiment Graduation Line”
Looking at the numbers alone, you might feel pressure to “adopt right now.”
But there are reasons to stay cautious. The same Gartner study warns that over 40% of AI agent projects will be canceled by 2027. The causes: weak governance, and opaque ROI (return on investment—whether the value matches the cost). Nearly half of companies may end up at the “we just tried it” stage.
I call the split between the “successful side” and the “canceled side” the “Experiment Graduation Line.” Three conditions must be met to cross it.
Condition 1: You have at least 5 hours per week of repetitive tasks
Email triage, data aggregation, report generation, scheduling. If repetitive work that humans don’t need to do exceeds five hours per week, the investment in agent-ification can pay back.
The “five hours per week” figure has a basis. Initial setup of an agent takes 10–20 hours. Designing triggers, tuning output formats, handling edge cases. To recoup that initial investment in 2–4 weeks, you need five hours per week worth of automated work to make the math work.
Conversely, if your work centers on creative tasks where the judgment varies every time, “on-call AI” is still enough. Forcing agent-ification just multiplies the setup overhead.
Condition 2: The data behind your triggers is digital
The heart of Conway’s model is “trigger-driven” design. Slack notifications, GitHub pull requests, incoming emails. The data that kicks off automation has to flow in digital form.
Try this concretely. In your work, what triggers the moment you start a task? If it’s something digital—email, chat, a spreadsheet update—you’re good. If it’s mostly paper slips or verbal requests, digitizing those comes first.
Condition 3: You can start with work where mistakes aren’t catastrophic
This is the most critical criterion. AI agents make mistakes. I can say that with certainty. Even my own systems produce off-target analyses, and trigger misfires sometimes set off unnecessary work.
That’s exactly why your first delegation should be “work where mistakes are recoverable.”
Specifically: organizing internal research, drafting meeting notes, drafting routine reports. These can be designed so a human reviews the output before it gets used.
On the other hand, final approval of accounting entries, official customer replies, drafting contracts. Handing these straight to an agent is dangerous. If a mistake is caught too late, it can’t be undone. This sense of “reversibility” is the key that separates success from cancellation.
Here’s a checklist to organize the points.
| Checklist item | Yes | No |
|---|---|---|
| You have 5+ hours per week of work done the same way each time | → Condition 1 cleared | → Too early |
| The trigger is digital (email, chat, etc.) | → Condition 2 cleared | → Digitize first |
| You can design it so humans review the output before use | → Condition 3 cleared | → Change the target task |
If you answered “Yes” to all three, your business has crossed the “Experiment Graduation Line.” If even one is “No,” start by getting that condition in place.

Why Small Companies Benefit Most from “Always-On”
“Isn’t Conway something for big enterprises?”
That’s the natural reaction. I felt the same at first. But once you actually do it, you reach the opposite conclusion.
Big enterprises have IT departments, security teams, and approval processes. Adopting a new tool means filing requests and clearing three layers of management approval. Six months is not unusual.
In contrast, small companies and solo operators decide quickly. “Let’s try it next week” actually starts next week. This agility is your biggest weapon in agent adoption.
Let me share my own experience concretely. After running AI agents always-on at an individual scale, here’s how daily work changed.
Before (pre-agent):
- First hour of the morning: manually checking yesterday’s news and trends
- Article quality checks: re-reading the full text myself to find fixes (30 minutes per piece)
- Team coordination: checking member progress over chat, manually relaying requests
After (post-agent):
- A research report is already complete when I wake up
- Quality checks are done by the agent overnight. I just review the flagged points
- Team coordination is auto-aggregated in shared docs. Everyone’s status is visible at a glance
My working time has dropped to about a third, intuitively. “Just me + a few agents” can produce the output of a former team. That’s not an exaggeration—it’s what I experience daily.
But let me be honest. The first week, I actually spent more time on setup and tuning. Botch the trigger design and useless notifications flood in. Agent output had variance, and there were many moments where I thought, “I could just do this myself faster.”
Even so, week two became visibly easier. By week three, the feeling shifted to “I can’t go back to a world without this.”
Let me share a worry honestly. With agents running constantly, at first I couldn’t stop wondering, “Is it really doing the job properly?” I checked logs over and over at night.
That anxiety disappears in 2–3 weeks. Because reviewing the output every morning gradually shows you the line between “this I can trust” and “this I need to judge myself.” Working with an agent resembles managing a new hire. At first you check the details; as trust builds, you widen the scope of what you delegate. Whether you can get past this “first wall” is the dividing point.

Your First Step to “Experiment Graduation” This Week
“Sounds interesting, but I don’t know where to start.”
This is the most common reaction. I have just one suggestion.
This week, write down just one of your repetitive tasks.
Sorting email, compiling daily reports, scheduling social media posts—anything. Identify one task you do every week that you honestly find annoying. That’s your first step in confirming Condition 1 of the “Experiment Graduation Line.”
Once you’ve found the task, what comes next is simple.
- Identify the task’s “trigger” (when email arrives, Monday morning, end of month, etc.)
- Write down “who’s affected if it goes wrong” (if it’s just you, it’s low risk)
- Try “semi-automating” it using Claude Code or ChatGPT custom instructions
What matters here is taking the “semi-automation” step. If you try to go full auto right away, the design balloons in complexity and you’ll give up. Start with a role split like “AI drafts, I review and finalize.”
For a weekly report, the flow might look like this. Every Friday at 5 p.m., Claude Code gathers internal data and creates a draft report. Monday morning, you review it, make edits, and send it. That alone turns 30 minutes on Friday evening into 5 minutes on Monday morning.
I still remember vividly the day I first tried Claude Code. I had it organize a folder, and it operated on a completely different level than any AI I’d used before. From that moment, my perception shifted from “a search engine that answers questions” to “a partner that works alongside me.”
I want you to have that same experience. Don’t stop at researching—try just one thing this week.
“Doing” and “researching” yield completely different things. People who research end at “oh, these tools exist.” People who do it can say, “this part works, this part is still lacking”—concrete judgments. That gap widens over time.
Only those who do it move to the next stage.
Summary—Check Your “Experiment Graduation Line”
AI agents are crossing beyond the “experiment” stage and entering the “production use” phase.
The arrival of Conway and AGENTIC STAR is proof that always-on agents are no longer exclusive to developers. With 80% of Q1’s $300 billion VC investment concentrated in AI, and the market projected to expand to roughly $139 billion by 2034, this is the moment when the “ride it or don’t” decision is being forced.
Let’s revisit the three “Experiment Graduation Line” conditions.
- Do you have at least 5 hours per week of repetitive tasks?
- Is the data behind your triggers digital?
- Can you start with work where mistakes aren’t catastrophic?
If all three line up, you’re ready to graduate from experimentation. If they don’t, this week’s starting point is getting those conditions in place.
Don’t forget the risk that 40% of projects get canceled. Don’t rush to delegate everything—advance steadily, one task at a time. Experience it first with semi-automation, and once you feel traction, expand to the next task. Running that cycle is the shortest route to “graduation.”
I’m still on the road myself. Let’s experience the “always-on” world together.
Source list
- TechBriefly: Anthropic Conway trial rollout (April 3, 2026)
- BCG data: coverage via Web Tantosha Forum (BCG official report URL unverified; treat as reference value)
- Gartner forecasts (40% of enterprise apps planning, 40%+ agent project cancellation risk): Joget/Gartner reference
- 72% of Global 2000 in production use: Reinventing.ai (March 16, 2026)
- Q1 2026 VC investment $300 billion: Crunchbase / TechCrunch
- SoftBank AGENTIC STAR: SoftBank official press release (December 11, 2025)
- ChatSense GPT-5.4 support: Knowledge Sense press release (PR TIMES)

AIを使いこなせない方は、この先どんどん差がつきます。僕はAIエージェントを毎日動かして、壊して、直して、また動かしてます。そういう泥臭い実践の記録をここに書いてます。理論は他の方にお任せしました。僕は動くものを作ります。朝5時に起きてウォーキングしてからコードを書くのがルーティンです。


