AWESOME COMPUTER USE // CURATED RESOURCES

MODEL	PARAMS	HIGHLIGHTS	LINKS
NORTHSTAR CUA FAST TZAFON AI	4B	RL-trained with GRPO on synthetic environments. Excels at error recovery and multi-turn interactions. 55% on OSWorld Chrome.	HuggingFace API
UI-TARS	7B/72B	Native GUI agent with System-2 reasoning. 24.6 on OSWorld.	Paper GitHub
AGUVIS	7B/72B	Unified pure vision GUI agent across platforms.	Paper GitHub
COGAGENT	18B	High-resolution cross-module attention for GUI understanding.	Paper GitHub
SEECLICK	7B	Visual GUI agent with element grounding capabilities.	Paper GitHub
SHOWUI	2B	Lightweight vision-language-action model for UI grounding.	GitHub
FERRET-UI	-	Apple's grounded mobile UI understanding with multimodal LLMs.	Paper GitHub
OMNIPARSER	-	Microsoft's vision-based GUI agent parser.	Website GitHub

< 03 >

DEVELOPER TOOLS

// SDKS, FRAMEWORKS, SANDBOXING

< 03.1 > SDKS & APIS

LIGHTCONE SDK FEATURED

The API and runtime for Northstar CUA. Supports task-based automation, custom agent loops, and OpenAI-compatible endpoints. Available for Python (pip install tzafon) and Node.js (npm install @tzafon/lightcone).

TZAFON AI

CLAUDE COMPUTER USE API

Anthropic's official computer use documentation and API reference.

ANTHROPIC

BROWSER USE

High-level framework for building browser automation agents with LLMs.

OPEN SOURCE

< 03.2 > AGENT FRAMEWORKS

LIGHTCONE AGENT HARNESS FEATURED

Full desktop automation loop with sandbox execution. Apache 2.0 licensed.

TZAFON AI

COMPUTER USE OOTB

Out-of-the-box computer use implementation.

SHOWLAB

SELF-OPERATING COMPUTER

Framework for multimodal AI computer control.

OTHERSIDEAI

OPENINTERPRETER

Natural language interface for computer control.

OPEN SOURCE

GRUNTY

Lightweight computer use agent.

OPEN SOURCE

CRADLE

Empowering foundation agents towards general computer control. [Paper]

BAAI

AGENT-E

Foundational design principles in agentic systems.

RESEARCH

OS-COPILOT

Generalist computer agents with self-improvement.

RESEARCH

< 03.3 > SANDBOXING & EXECUTION

E2B DESKTOP SANDBOX

Secure cloud sandboxes for running GUI agents.

E2B

OSWORLD DOCKER

Containerized environments for agent evaluation.

XLANG-AI

AGENTSTUDIO

Toolkit for building general virtual agents.

RESEARCH

< 03.4 > PLATFORM-SPECIFIC

CLAUDE COMPUTER USE FOR MACOS

Anthropic computer use adapted for Mac.

MACOS

FAZM

MIT-licensed voice-controlled AI agent for macOS using accessibility APIs.

MACOS

APPAGENT

Multimodal agents as smartphone users. [Paper] [GitHub]

MOBILE

< 03.5 > OTHER PROJECTS

OPENADAPT

Open source process automation.

OPEN-INTERFACE

Natural language computer interface.

BYTEBOT

AI-powered automation bot.

WEBMARKER

Visual element marking for web agents.

CUA

Computer use agent framework.

UI-ACT

UI interaction agent.

< 04 >

RESEARCH PAPERS

// CATEGORIZED ARCHIVE

> TECHNICAL DEEP DIVES // 1 PAPER

Training VLM for CUA

Deep dive into why SFT saturates, how positional encoding affects click accuracy (40% to 80% improvement), and why multi-turn RL enables robust error recovery.

TZAFON AI

> SURVEY // 2 PAPERS

AI Agents for Computer Use: A Review of Instruction-based Computer Control, GUI Automation, and Operator Assistants

Comprehensive survey of the field.

2025

OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use

Survey on multimodal agents for computing devices.

> MODELING & ARCHITECTURE // 19 PAPERS

EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience

Self-improving CUA through synthetic data generation and iterative RL. 56.7% on OSWorld.

FUDAN ET AL., 2025

UI-TARS: Pioneering Automated GUI Interaction with Native Agents

Native GUI agent with System-2 reasoning. State-of-the-art on 10+ benchmarks.

ALIBABA, 2025

Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents

Self-adaptive agents in realistic environments.

2025

PC Agent: While You Sleep, AI Works

Cognitive journey into digital world.

2024

Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

Pure vision approach without HTML parsing.

2024

Agent S: An Open Agentic Framework that Uses Computers Like a Human

Experience-augmented hierarchical planning.

2024

Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents

Advanced reasoning capabilities.

2024

OSCAR: Operating System Control via State-Aware Reasoning

State-aware reasoning and re-planning.

2024

AgentStore: Scalable Integration of Heterogeneous Agents

Specialized generalist computer assistant.

2024

Agent Workflow Memory

Memory systems for agent workflows.

2024

Web Agents with World Models

Learning environment dynamics for web navigation.

2024

AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents

Simple but effective baseline.

2024

Tree Search for Language Model Agents

Tree search methods for LLM agents.

2024

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum RL

Self-evolving curriculum reinforcement learning.

TSINGHUA, 2024

ECLAIR: Enterprise sCaLe AI for woRkflows

Enterprise-scale AI workflows.

STANFORD, 2024

GitHub

SeeAct: GPT-4V(ision) is a Generalist Web Agent, if Grounded

Visual grounding for generalist web agents.

OSU, 2024

GitHub

WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models

LMM-powered web navigation.

2024

ICAL: Continual Learning of Multimodal Agents by Transforming Trajectories

Actionable insights from trajectories.

NEURIPS 2024

CogAgent: A Visual Language Model for GUI Agents

High-resolution cross-module attention.

TSINGHUA, 2023

> GUI GROUNDING // 8 PAPERS

GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents

Coordinate-free visual grounding approach.

2025

Attention-driven GUI Grounding

Leveraging pretrained MLLMs without fine-tuning.

AAAI 2025

Navigating the Digital World as Humans Do

Universal visual grounding for GUI agents.

2024

OS-ATLAS: Foundation Action Model for Generalist GUI Agents

Foundation model for GUI grounding.

ICLR 2025

OmniParser for Pure Vision Based GUI Agent

Vision-based GUI parsing.

MICROSOFT, 2024

Ferret-UI 2: Universal User Interface Understanding Across Platforms

Cross-platform UI understanding.

APPLE, 2024

SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents

GUI grounding pre-training.

ACL 2024

Set-of-Mark (SoM) Prompting

Unleashing visual grounding in GPT-4V.

MICROSOFT, 2023

GitHub

> AGENT DATA & TRAJECTORY SYNTHESIS // 7 PAPERS

Explorer: Robust Collection of Interactable GUI Elements

GUI element collection.

2025

Explorer: Scaling Exploration-driven Web Trajectory Synthesis

Multimodal web agents data.

2025

OS-Genesis: Automating GUI Agent Trajectory Construction

Reverse task synthesis.

ACL 2025

AgentTrek: Agent Trajectory Synthesis via Guiding Replay

Web tutorials for trajectory synthesis.

ICLR 2025

AndroidLab: Training and Benchmarking Android Autonomous Agents

Android agent training data.

2024

GUI-World: A GUI-oriented Dataset for Multimodal LLM-based Agents

Comprehensive GUI dataset.

2024

Synatra: Turning Indirect Knowledge into Direct Demonstrations

Digital agents at scale.

NEURIPS 2024

> BENCHMARKS & EVALUATION // 11 PAPERS

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks

The definitive benchmark. 369 tasks across Ubuntu, Windows, macOS. Best AI: 12.24% vs 72% human.

NEURIPS 2024

Website GitHub

AndroidWorld: A Dynamic Benchmarking Environment

Dynamic Android environment benchmark.

2024

GitHub

Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale

Windows-specific benchmark.

2024

WebArena

Realistic web environment with functional evaluation.

VisualWebArena

Multimodal web agent benchmark.

Mind2Web

Large-scale web agent dataset.

ScreenSpot

GUI grounding benchmark across platforms.

Pro Version

CRAB: Cross-environment Agent Benchmark

Multimodal language model agents.

2024

MobileAgentBench

Efficient benchmark for mobile LLM agents.

2024

Spider2-V: Automating Data Science Workflows

Data science automation benchmark.

NEURIPS 2024

ScienceBoard: Scientific Workflows Evaluation

Multimodal agents in realistic scientific workflows.

2025

> SAFETY & SECURITY // 5 PAPERS

Attacking Vision-Language Computer Agents via Pop-ups

Adversarial attacks through pop-up injection.

2024

MobileSafetyBench: Evaluating Safety of Autonomous Agents

Mobile device control safety.

2024

GuardAgent: Safeguard LLM Agent via Knowledge-Enabled Reasoning

Guard agent architecture.

2024

EIA: Environmental Injection Attack for Privacy Leakage

Privacy attacks on web agents.

2024

Adversarial Attacks on Multimodal Agents

Comprehensive attack analysis.

2024

< 06 >

RESOURCES

// VIDEOS, BLOGS, TUTORIALS

VIDEOS & TALKS

Claude | Computer use for coding

Official Anthropic tutorial

YOUTUBE

Claude | Computer use for automating operations

Operations automation demo

YOUTUBE

Claude | Computer use for orchestrating tasks

Task orchestration walkthrough

YOUTUBE

LLMs as Computer Users: An Overview

Comprehensive slide deck

FIGMA

BLOGS & ARTICLES

// INDUSTRY PERSPECTIVES

AI is about to completely change how you use computers

Bill Gates on AI agents

GATESNOTES

When you give a Claude a mouse

Analysis of computer use implications

ETHAN MOLLICK

Claude's agentic future

Frontier models and agency

NATHAN LAMBERT

// TECHNICAL ANALYSIS

Training VLM for CUA

Deep technical analysis of VLM training for computer use

TZAFON AI

Initial explorations of Computer Use

Technical exploration

SIMON WILLISON

Notes on Anthropic's Computer Use Ability

Implementation notes

COMPOSIO

// TUTORIALS & GUIDES

Computer Use by Anthropic: 5-Minute Setup Guide

Quick start guide

GLAMA.AI

Automating macOS using Claude Computer Use

macOS-specific tutorial

GLAMA.AI

Anthropic Computer Use: Automate Your Desktop With Claude 3.5

DataCamp tutorial

DATACAMP

Instant Claude Computer Use Demo

Docker-based demo

LABEX

AWESOME
COMPUTER
USE

FEATURED

NORTHSTAR CUA FAST

LIGHTCONE SDK

TRAINING VLM FOR CUA

OPEN SOURCE MODELS

DEVELOPER TOOLS

RESEARCH PAPERS

COMMERCIAL PLATFORMS

RESOURCES

COMMUNITY

AWESOMECOMPUTERUSE

FEATURED

NORTHSTAR CUA FAST

LIGHTCONE SDK

TRAINING VLM FOR CUA

OPEN SOURCE MODELS

DEVELOPER TOOLS

RESEARCH PAPERS

COMMERCIAL PLATFORMS

RESOURCES

COMMUNITY

AWESOME
COMPUTER
USE