Nova-CUA | Jaden Varkey

No ChatGPT/Gemini manuals, no command syntax, no tutorials.
Just say what you want — and your AI does it.

That's the vision we had when building Nova-CUA — a JARVIS-like desktop agent that turns natural language into real actions on your computer.

"Open LinkedIn, search for ML internships, and save them."
"Open Firefox and buy me the latest AirPods from Amazon."
"Launch YouTube and play Barcelona's latest match highlights."

All without lifting a finger.

With my teammates Archeet Shah and Peter Bui, we designed Nova-CUA, a desktop automation agent that combines LLMs with GUI grounding. It interprets your instructions, generates a plan, and directly operates your desktop to carry out tasks.

Our workflow included:

Gemini 2.5 for planning and code generation for task execution.
InterVL-4B for GUI grounding — to identify and interact with on-screen elements like buttons, icons, and text.

The most rewarding part for me is learning about how multi-agent workflows are structured, integrating LLMs with GUI grounding for end-to-end automation.