Claude Code Just Got Eyes: A Complete Walkthrough for Letting Computer Use Handle Your Screen

I use Claude Code every day. It writes code for me, edits my files, runs my tests. For any work that fits inside a terminal, it’s already faster than I am.

But one thing kept nagging at me. In front of “apps that only run as a GUI,” Claude was completely powerless. Open a browser and click a button, navigate a desktop app’s menus, verify behavior in a simulator. That part, I had to do myself.

On March 23, 2026, Anthropic added a “Computer Use” feature to Claude Code. It’s labeled a research preview, but if you’re on Pro or Max, you can try it today.

When I first heard “operate native apps from the CLI,” honestly, I was skeptical. After actually setting it up and running it, it exceeded my expectations. In this article, I’m sharing every step I took from zero to working setup, along with all the snags I hit.

I burned three hours on this — I want you to get through it in 15 minutes.

The flow of this article is: ① Understanding the mechanism → ② Preparation and setup → ③ Live demo → ④ Gotchas. You can read it in any order, but for your first time, going through it sequentially will get you running fastest.

What it means for AI to have “eyes” — let’s understand the mechanism first

When you hear “AI operates the screen,” it might sound like magic. The mechanism is simple.

Computer Use runs in a 3-step loop.

Take a screenshot — Claude captures the screen
Analyze the image — recognizes UI elements at the pixel level and decides the next action
Execute the action — clicks, types, scrolls, etc.

It repeats this “until the result reaches the goal.” What humans do with a mouse and keyboard, reproduced on a screenshot basis.

Diagram of the Computer Use loop. Cycle of screenshot capture → image analysis → action execution → re-screenshot

The key point is that it operates by understanding “what the screen looks like.” Even GUI-only apps with no API or CLI can be operated by reading button positions and text from the image.

Here’s a concrete list of what it can do.

Launching and operating native apps: Open Xcode, build, and automate verification all the way through the simulator
Browser operations: Filling forms and clicking buttons in web apps
File drag & drop: Supported as of the Q1 update
Multi-monitor recognition: Handles operations spanning multiple screens
Clipboard operations: Automating copy & paste

The wall of “things you could only do via GUI” can now be broken through from the CLI. With this mechanism in your head, let’s move on to preparation.

Before you start: from checking requirements to setup

Three conditions to check

To try Computer Use, there are three conditions. Miss any one and it won’t launch.

Condition 1: Pro or Max plan

It doesn’t work on the free or Team plan. You need either the Pro plan ($20/month) or the Max plan ($100/month). If you’re already on Pro, there’s no additional cost.

Condition 2: macOS (as of April 2026)

At the moment, only macOS is supported. Windows and Linux aren’t supported yet, with the official documentation noting that a Windows version is planned. On macOS, you’ll need Accessibility and Screen Recording permissions.

Condition 3: Claude Code v2.1.85 or higher

If your version is too old, the feature won’t appear. Check it in your terminal.

# Check version
claude --version
# Example output: claude-code v2.1.92

If it’s below v2.1.85, update it.

# Update via npm
npm update -g @anthropic-ai/claude-code

Full setup walkthrough: from launch to “AI grabbing your screen” in 15 minutes

Once all three conditions are met, it’s on to setup. I’ll explain assuming macOS.

Step 1: Enable Computer Use

Launch Claude Code and open the settings.

# Launch Claude Code
claude

# Open settings (run inside a session)
/config

In the settings menu, there’s a “Computer Use” toggle. Enable it. Settings are saved per project, so once you turn it on, it stays on for next time.

Screen capture showing the Computer Use toggle being turned on in Claude Code's settings screen

Step 2: Grant macOS permissions

The first time Computer Use tries to operate your screen, macOS will request two permissions.

Accessibility: Needed for Claude to click, type, and scroll. Screen Recording: Needed for Claude to see the screen.

Grant both from “System Settings → Privacy & Security.”

System Settings → Privacy & Security → Accessibility
→ Enable your terminal app (Terminal / iTerm2 / Warp, etc.)

System Settings → Privacy & Security → Screen Recording
→ Enable your terminal app

A heads-up here. After granting Screen Recording permission, you may need to restart your terminal. If you get a “Permission denied” error despite granting permission, close the terminal and reopen it. I burned 20 minutes on this one.

Step 3: Your first Computer Use run

Once permissions are granted, all you need to do is give an instruction in natural language inside a session.

> Open Safari and search for "Claude Code Computer Use" on Google

On the first run, you’ll see a confirmation dialog asking “which app to operate.”

Claude wants to control: Safari
[Allow for this session] [Deny]

If you pick “Allow for this session,” Safari can be operated during that session. Close the session and permission resets.

Three safety mechanisms worth knowing

I get the fear of “handing the screen over to AI.” I was nervous at first too. Thanks to the following three safety mechanisms, it stays within the boundary of “operating only the apps you’ve allowed, while you’re watching.”

App isolation: During operation, apps you haven’t permitted are automatically hidden. No worry about unintended apps being touched
Terminal exclusion: Your terminal window is excluded from screenshots. The contents of your instructions and your API keys don’t end up in the captured image
Automatic restoration: When Claude’s operation turn ends, the hidden apps are automatically restored

Three Computer Use safety mechanisms. App isolation, terminal exclusion, and automatic restoration illustrated in three columns

Putting it to work: “write code, build it, verify on screen” — all in one command

Now that setup is done, I want to show you how this actually plays out.

Case 1: Verifying a web app

I asked Claude Code to build a simple counter app in React, run it in the browser, and verify it works.

> Build a counter app in React, start it with npm start,
> and click the button 3 times in the browser to verify it works

Here’s what Claude did.

Ran npx create-react-app counter-app (CLI)
Wrote the counter code in src/App.js (CLI)
Started the dev server with npm start (CLI)
Computer Use takes over here: The browser opens, the counter appears
Clicked the ”+” button. Three times. Verified via screenshot that the number incremented
Reported: “Counter correctly increased from 0 to 3”

The cycle of “write code → build → verify on screen” closed inside a single command.

Until now, this required a round-trip: “have it write the code → open the browser myself → verify behavior myself → tell Claude the result.” That round-trip goes to zero.

Case 2: Operating a desktop app

The other thing I tried was operating Finder (macOS’s file manager).

> Open Finder and create a new folder called
> "claude-test" in the Documents folder

Claude moved through these steps.

Launched Finder (Computer Use)
Clicked “Documents” in the sidebar (Computer Use)
Right-click → New Folder (Computer Use)
Typed “claude-test” as the folder name and pressed Enter (Computer Use)
Confirmed via screenshot that the folder was created

If it were CLI-only, mkdir ~/Documents/claude-test would have finished it. But what’s important in this example is the proof that “AI can understand and execute GUI operation steps.” The same thing works for GUI-only apps that can’t be reduced to a CLI.

A series of three screenshots showing Claude Code operating Finder to create a new folder

Build your baseline with /powerup: what’s inside the 18 lessons

There’s a feature you’ll want to know about alongside Computer Use: the /powerup command.

On April 1, 2026, Claude Code v2.1.90 added this interactive tutorial feature (see the Claude Code release notes). You can learn Claude Code’s major features with animations, right inside the terminal.

# Launch /powerup
> /powerup
# Use arrow keys to pick a lesson → Enter to start

As of April 2026, there are 18 lessons.

Context management: How to use CLAUDE.md, how to pass project information
Hooks: A mechanism to auto-run shell commands before/after tool execution
MCP: Configuring connections to external tools
Subagents: How to split tasks and run them in parallel
/loop command: How to set up periodic execution and monitoring

This is perfect for people who find “reading documentation a chore.” Without leaving the terminal, you can learn features while watching the actual demo. The fact that it’s available to all users — Pro, Max, or free — is also nice.

The Hooks lesson was especially useful for me. I set up “auto-run lint on file save” in 5 minutes while watching the lesson. If I’d been reading docs, it would have taken 30 minutes.

All the gotchas, exposed: the 5 walls I hit and how I got past them

This might be the real meat of the article. Both setup and operation “work if you follow the steps,” but there’s always a moment when the steps don’t work. Let me share the 5 walls I hit in advance.

Wall 1: Screen Recording permission isn’t taking effect

Symptom: You turned on the permission, but you still get “Screen recording permission not granted.”

Cause: Changes to macOS permissions may require restarting the app.

Fix: Fully quit your terminal app and restart it. Not “close window” — quit it with Cmd+Q.

Wall 2: You don’t notice your version is too old

Symptom: The Computer Use toggle isn’t in the settings screen.

Cause: Your Claude Code version is below v2.1.85.

Fix:

# Check the current version
claude --version

# Update via npm
npm update -g @anthropic-ai/claude-code

# Check the version again
claude --version

If your npm global install isn’t on the right path, an old version can linger. Check the path with which claude.

Wall 3: The target app can’t be found

Symptom: You said “Open Safari” but got back “Cannot find application.”

Cause: The app name may not be exact. On macOS it’s “Safari,” but third-party apps sometimes need their full name.

Fix: Check the app names inside /Applications/ and instruct using the exact name.

# List apps
ls /Applications/

Wall 4: Operations stop midway

Symptom: While Claude is operating the screen, a popup or notification appears and interrupts the operation.

Cause: macOS notifications and system dialogs interfere with Computer Use’s screen recognition.

Fix: Turn on “Do Not Disturb” before operating. You can enable it from Notification Center. It took me three redos to figure this out.

Wall 5: Japanese input is flaky

Symptom: When typing Japanese text, the IME conversion candidates get in the way and you can’t input correctly.

Cause: Computer Use sometimes can’t accurately recognize the IME (Japanese input method) conversion window.

Fix: When Japanese input is needed, switch to English mode beforehand and give the instruction. Pasting Japanese text via the clipboard is more stable.

> Open a text editor and paste the Japanese text
> from the clipboard

Just knowing these five in advance should cut the stress significantly.

Wrap-up: a CLI agent with “eyes,” and what’s next

What I felt after using Computer Use is that it’s an extension of the “vibe coding” we’ve had so far.

Vibe coding was the style of “instructing in natural language and having code written for you.” Add Computer Use, and the scope of instructions widens from “write code” to “operate the screen.” Write code, build, run, verify. The whole cycle can now turn on natural language.

I’ve taken to calling this “vibe operating.”

I used to walk away from code. I thought I couldn’t match a pro engineer. After meeting Claude Code, I felt like a master engineer had taken up residence inside me. And now, with Computer Use added, it’s as if that master engineer said “I’ll handle the screen operations too.”

That said, don’t forget this is still a research preview. Operation accuracy isn’t 100%, and complex GUI actions can fail. There’s still some flakiness in Japanese environments.

Rather than “an all-purpose automation tool,” it fits better as “a new experiment that extends the CLI.” Don’t expect perfection — touch it with the mindset of “lucky if it works.”

According to Pragmatic Engineer’s February 2026 survey, Claude Code was picked as the “most-loved tool” by 46% of developers. Computer Use will likely push that rating even higher.

Let me sum up the key points.

What it can do: Operate native app GUIs from the CLI. Runs on a screenshot → analyze → operate loop
What you need: Pro or Max plan, macOS, Claude Code v2.1.85 or higher
Setup: Turn on Computer Use → grant macOS permissions → instruct in natural language. Done in 15 minutes
Gotchas: Permissions need a restart to apply. Turn off notifications. Japanese input is more stable via the clipboard
Current status: Research preview. Not perfect, but well worth trying as an extension of the CLI

If you haven’t tried it yet, start with /powerup today. Going through the 18 lessons to grasp Claude Code’s overall features, then moving on to Computer Use, is the fastest route.

An AI that only “wrote code” has reached the point of “operating the screen.” What will it be able to do next? I want to watch that change from the front row, and I’ll keep writing here about everything I experience.

For someone who once walked away from code, the future where I can build products together with AI feels one step closer.