BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News AWS WorkSpaces Now Lets AI Agents Operate Legacy Desktop Applications Without APIs

AWS WorkSpaces Now Lets AI Agents Operate Legacy Desktop Applications Without APIs

Listen to this article -  0:00

AWS has announced that Amazon WorkSpaces can now serve as managed virtual desktops for AI agents, letting them operate legacy desktop applications through computer vision and input simulation without requiring application modernization or API integration.

The problem it addresses is widespread. According to a 2024 Gartner report, 75% of organizations run legacy applications that lack modern APIs, and 71% of Fortune 500 companies operate critical processes on mainframe systems without adequate programmatic access. For these organizations, deploying AI agents has meant choosing between expensive modernization projects or delaying adoption entirely.

WorkSpaces takes a different approach: give the agent the same desktop a human employee uses. The agent authenticates through IAM, connects to a WorkSpaces instance at a unique pre-signed URL, and interacts with applications by taking screenshots (computer vision), clicking, typing, and scrolling (computer input). The application doesn't know an agent is driving it. Nothing about the software needs to be modified.

Workspaces Screenshot

(Source: AWS News Blog post)

Chris Noon, Director at Nuvens Consulting, described the value for regulated industries in the announcement:

WorkSpaces lets our clients give AI agents the same secure, governed desktop environment their employees already use. No custom API integrations, full audit trails, and enterprise-grade isolation out of the box. For regulated industries, that's not a nice-to-have, it's the baseline.

The MCP integration is what makes this framework-agnostic. WorkSpaces exposes a managed MCP endpoint, meaning any agent framework that speaks MCP, including LangChain, CrewAI, and Strands Agents, can connect. AWS demonstrated the capability with a Strands agent built on Amazon Bedrock handling a prescription refill workflow inside a sample pharmacy system: looking up the patient record, searching for the medication, placing the order, and confirming the refill, all without an API.

The security model inherits everything enterprises already have in place for human WorkSpaces environments. Agents run within isolated WorkSpaces instances, not on local machines or internal networks. CloudTrail captures all activity for audit. CloudWatch provides observability. AWS recommends giving each agent a unique IAM identity to distinguish agentic actions from human activity. Desktop screen resolution, image format, and agent capabilities (computer input, computer vision, screenshot storage) are all configurable per stack.

The cost question is the obvious skeptical angle. Reflex, an AI coding company, recently published benchmark research showing that a vision agent consumed roughly 500,000 input tokens to complete a task that an API agent handled in 12,000 tokens, a 45x cost difference. Palash Awasthi, Reflex's head of growth, argued that:

Better vision models reduce error rates per screenshot, but they do not reduce the number of screenshots required to reach the relevant data.

The vision agent also took 17 minutes compared to 20 seconds for the API path. Awasthi acknowledged that better models will eventually lower costs, but maintained that vision-based agents will always require more steps than API-based alternatives.

That tradeoff is precisely the point AWS is making: computer-use agents and APIs solve fundamentally different problems. When an API exists, agents should use it. But the majority of enterprise software, legacy ERP systems, thick-client applications, and proprietary tools simply don't have API access.

For those applications, a 45x more expensive agent may still be cheaper than a multi-year modernization project. The question for each organization is whether the workflow automation value justifies the token cost at their specific scale. The ephemeral nature of cloud desktops helps with cost management: organizations can spin up a WorkSpaces instance for a specific task and shut it down when the agent is done, rather than maintaining always-on infrastructure.

Microsoft is pursuing a similar approach with Windows 365 for AI agents, creating a parallel category of cloud desktop services in which AI systems operate software through the UI rather than APIs.

WorkSpaces agent access is available in preview in US East (N. Virginia, Ohio), US West (Oregon), Canada (Central), Europe (Frankfurt, Ireland, Paris, London), and Asia Pacific (Tokyo, Mumbai, Sydney, Seoul, Singapore). A GitHub repository containing sample code is now available.

About the Author

Rate this Article

Adoption
Style

BT