Jan 26, 2026

Google Unveils AI Model That Can Browse the Web and Control User Interfaces

Key takeaways Google launches Gemini 2.5 Computer Use, an AI model that autonomously navigates web browsers and interacts with websites by clicking, typing, and scrolling. The model outperforms compet...

Key takeaways

Google has released Gemini 2.5 Computer Use, a specialized artificial intelligence model designed to autonomously navigate web browsers and interact with digital interfaces, marking a significant step toward AI agents capable of completing complex online tasks without human intervention.

The model, announced October 7, 2025, represents Google's entry into the emerging field of computer-controlling AI, joining competitors OpenAI and Anthropic in developing systems that can operate software through visual understanding rather than traditional programming interfaces.

How does the technology work?

Gemini 2.5 Computer Use operates through an iterative feedback loop that mimics human-computer interaction. The model receives a user's request along with screenshots of the current screen and a history of recent actions.

It then analyzes this information and generates specific user interface actions such as clicking buttons, typing text, scrolling pages, or manipulating dropdown menus.

After each action executes, a new screenshot returns to the model, allowing it to assess the results and determine the next step. This cycle continues until the task completes, an error occurs, or safety protocols halt the process.

The model builds on the visual understanding and reasoning capabilities of Gemini 2.5 Pro, Google's flagship AI system released in March 2025. Earlier versions have powered features in Google's Project Mariner and AI Mode search capabilities.

Performance and capabilities

Google claims the model outperforms competing solutions on multiple web and mobile control benchmarks, including Online-Mind2Web and WebVoyager, while maintaining lower latency.

The company optimized the system primarily for web browser control, though it shows promise for mobile user interface tasks on the AndroidWorld benchmark. Desktop operating system control remains unsupported in the current version.

The technology supports 13 distinct actions, including clicking, typing, scrolling, drag-and-drop operations, navigating browser history, and executing keyboard combinations.

Notably, the system can fill out forms, manipulate interactive elements like filters and dropdowns, and operate behind login screens.

Early adoption and real-world applications

Several companies and internal Google teams have begun testing the technology with positive results. Google's payments platform team reported that the model successfully recovers over 60 percent of failed test executions, significantly reducing engineering time previously spent on manual fixes.

Companies providing early testimonials include Autotab, an AI agent platform, which reported performance improvements up to 18 percent on complex data parsing tasks in their most difficult evaluations.

Poke.com, a proactive AI assistant provider, noted the model operates approximately 50 percent faster than competing solutions during interface interactions.

Google demonstrated the system's capabilities with two example tasks.

In one demonstration, the AI extracted pet owner information from a sign-up form, transferred specific data to a customer relationship management system, and scheduled an appointment with specified parameters. In another example, the model organized digital sticky notes on a virtual board by dragging them into appropriate categories.

Safety and security measures

Recognizing the risks inherent in AI systems that control computer interfaces, Google built multiple safety layers into the model. A per-step safety service evaluates each proposed action before execution.

Developers can configure system-level instructions requiring user confirmation or automatic refusal for high-risk actions, including attempts to harm system integrity, compromise security, bypass CAPTCHA, or control medical devices.

The company conducted extensive safety evaluations and red-teaming exercises in partnership with internal Google safety, security, and responsibility teams.

These assessments align with Google's AI Principles and address three primary risk categories: intentional adversarial misuse, unintentional model failures during benign use, and potential release of sensitive information.

Google acknowledges the model shares limitations common to large language models, including potential hallucinations and difficulties with causal understanding, complex logical deduction, and counterfactual reasoning.

The model's knowledge cutoff date is January 2025.

Market context and competition

Google's announcement follows similar moves by competitors in the AI industry.

Anthropic released computer use capabilities in its Claude AI model in 2024, and OpenAI enhanced its ChatGPT Agent feature just one day before Google's announcement.

However, unlike those competitors, which can access entire computer operating systems, Google's current implementation focuses exclusively on web browser control.

The timing reflects the broader industry push toward "agentic AI" systems capable of taking autonomous action on behalf of users.

Industry analysts predict the agentic AI market will grow from approximately $5.1 billion in 2025 to $47 billion by 2030, with 33 percent of enterprise software applications expected to integrate agentic AI by 2028.

Availability and pricing

Gemini 2.5 Computer Use is available now in public preview through the Gemini API on Google AI Studio and Vertex AI.

Developers can access demonstration environments hosted by Browserbase, a startup specializing in headless web browsers for AI applications.

The pricing structure mirrors the standard Gemini 2.5 Pro model, with token-based billing. Input tokens cost $1.25 per million tokens for prompts under 200,000 tokens, rising to $2.50 per million for longer prompts.

Output tokens are priced at $10 per million for shorter responses and $15 per million for longer ones. Unlike Gemini 2.5 Pro, which offers a free tier, Computer Use requires paid access from the outset.

Developers can access reference implementations and documentation through Google's developer resources and provide feedback through the Google Developer Forum to help shape the model's development roadmap.

MORE NEWS

Related News

Apr 17, 2026

Google Unveils AI Model That Can Browse the Web and Control User Interfaces

Key takeaways

How does the technology work?

Performance and capabilities

Early adoption and real-world applications

Safety and security measures

Market context and competition

Availability and pricing

Read more:

Related News

Anthropic Launches Claude Opus 4.7 With Major Gains in Coding, Vision, and Agentic Performance

Microsoft Eyes OpenClaw-Style AI Features for Copilot

Meta Releases Muse Spark, Its First Proprietary AI Model to Challenge ChatGPT and Gemini

10 Mins

99 %

22 + Years