VOICE-FIRST · MACOS NATIVE

If you can say it, your Mac can do it.

Simply speak to launch any number of specialized AI agents to perform tasks in real-time on any screen, application, or website on your Mac.

Instantly spawn an agent to:

  • Copy and save a workflow between apps
  • Watch a YouTube tutorial, then execute it
  • Guide you when onboarding to a new app
  • Find information you need on a web page
  • You read developer docs, it stages the setup in the background

Ditto sees and adapts to anything on your screen in real-time, guiding or taking control depending what you request.

Hands-free, voice-first to perform anything instantly on your desktop.

< 1s

Voice command to first cursor move on your screen.

100+

MCP tools connected — Mail, Calendar, browser, IDE, every app you use.

Works with Claude and OpenAI plans

LIVE
3:42

YouTube Tutorial Agent

Watches → mimics on your Mac

reservations.example.com/booking
Sarah K
guests
Submit
3 fields filled

Browser Agent

Find, fill, click — voice-driven

DITTO New
M
31
#

Workflow Agent

A starter agent that captures and replays workflows across your installed apps.

Learn Examples ditto.app

Workflow Agent

Build once, replay anywhere

Scroll

Voice

"Hey Ditto..." and your Mac listens.

On-device wake-word detection — no audio sent to the cloud until you confirm. Ditto transcribes locally, snapshots your live screen, and reasons about the request before any agent moves.

Demo video — coming soon

Wake-word → STT → agent reasoning → first cursor moves. The full voice loop in 30 seconds.

demo-wake-word-flow.mp4
01

Wake word fires locally

"Hey Ditto" detected by an on-device model. Audio never leaves the Mac until the wake word fires.

02

Live screen + transcript captured

Ditto snapshots what's visible (with privacy redaction) and the transcript becomes the agent's task brief.

03

Sub-agents dispatch

A planning agent picks which scoped sub-agents (Mail / Calendar / Browser / Research / Notes / Summary) handle the work. You see them go.

04

Result spoken back

When the agents finish, Ditto synthesizes the result and reads it back. ElevenLabs TTS or system voice — your pick.

Shadow Cursors

See your sub-agents — work.

Every Ditto sub-agent gets its own colored shadow cursor on screen. Not abstractions — actual pointers you can watch click, type, and read across native apps and web tabs. The cursors moving on this page are doing exactly what they'd do on your Mac.

Multi-Application Workflow Creation

Speak a goal — Ditto chains agents across Mail, Calendar, Slack, your IDE. Captures the workflow so it replays on demand.

Real-time Application Guidance

Live shadow cursors point, click, and explain as you onboard. Ditto reads the screen and shows you what to do.

Multi-tasking Across Applications

Spawn parallel sub-agents — one drafting in Notion, one cross-referencing your inbox, one updating your calendar. All at once.

E2E Application Testing

Validate user flows across your real app, browser, and APIs — Ditto runs the test like a person would.

Mail

Reads, drafts, sends, archives in Apple Mail or Gmail.

Calendar

Schedules, reschedules, finds free slots.

Ditto demo — coming soon

Multiple shadow cursors moving in parallel across the desktop — one per sub-agent — color-coded by role.

demo-shadow-cursors.mp4 · screenshot-shadow-cursors.png

Ditto Skills

What Ditto does — beyond the tool calls.

Ditto is more than a thin wrapper around MCP servers. Speak intent and Ditto builds workflows, mentors you through unfamiliar apps, runs end-to-end UX tests, and streams structured telemetry — all from your voice.

Workflow

Build workflows on the fly

Speak a multi-step task and Ditto builds a live workflow. Linear → Slack → Calendar in one prompt. Saved for next time.

demo-workflow-builder.mp4
Mentor

"Show me how"

Stuck in Concur, Workday, anything? Ditto takes the cursor, walks each click with annotations, hands control back.

demo-show-me-how.mp4
UX Testing

End-to-end UX testing

Hand Ditto a flow. It clicks, types, navigates, verifies — like a real user. QA, regressions, demo recordings, onboarding paths.

demo-e2e-testing.mp4
Telemetry

Telemetry on tap

Every shadow-cursor action streams structured telemetry to Loki / DataDog / Honeycomb / your webhook.

demo-telemetry-stream.mp4

MCP-Connected Tools

Your apps, your desktop, your real-time personal assistants.

Ditto speaks Model Context Protocol natively. Any MCP server you connect — Mail, Calendar, Browser, Files, Contacts, Slack, Linear, GitHub — becomes a tool every sub-agent can call. No bespoke integrations to maintain.

Mail

Apple Mail + Gmail. Sends always behind explicit approval.

Calendar

Google + iCloud. Find slots, schedule, decline.

Browser

Chrome MCP. Forms, scrapes, multi-step flows.

Files

Finder + Spotlight. Open, move, rename, search.

Applications

Reads any open app on your Mac via Accessibility API and Screen Recording — clicks, types, navigates like a person.

Any MCP server

Slack, Linear, GitHub, Notion, Stripe — drop in.

Demo video — coming soon

"Hey Ditto, draft a reply to my last email from Sarah, and find a 30-min slot tomorrow." Watch the Mail and Calendar cursors split the work in real time.

demo-mail-calendar-mcp.mp4

Product Surfaces

See what Ditto is doing — while it works.

Ditto demo — coming soon

Ditto's notes & memory surface — captured snippets, retrievable by voice, filed by tag.

demo-notes-memory.mp4 · screenshot-notes-memory.png
Ditto demo — coming soon

Ditto performing native macOS actions — clicking, typing, scrolling — driven by voice intent.

demo-desktop-execution.mp4 · screenshot-desktop-execution.png
Demo video — coming soon

Browser cursor scraping product specs across multiple tabs, condensing into a doc.

demo-browser-research.mp4

License + Pricing

One license. Bring your own keys.

Ditto is sold as a desktop license. You pay a flat monthly rate for the app + sub-agent runtime; you bring your own Anthropic / OpenAI / ElevenLabs keys for model calls.

Personal
$29 / month

For one operator on one Mac.

  • Wake-word + push-to-talk
  • All shadow-cursor agents
  • Built-in MCP tools
  • Local-only memory + notes
  • Lifetime updates
Get Ditto Personal
Team
$59 / seat / month

Workspaces, approval policy, audit trail.

  • Everything in Personal
  • Shared workflow library
  • Approval queues + RBAC
  • Audit log + redaction policy
  • Runner registry
Talk to us

Pricing is a placeholder; subject to change before public launch. Contact for enterprise.

FAQ

Honest answers — before you install.

Does Ditto need my Anthropic API key?

Yes. Ditto is bring-your-own-keys for model calls (Anthropic Claude, optionally OpenAI, ElevenLabs). Paste them in Settings on first run; they're stored in macOS Keychain and never leave your device except as the request body to those providers.

What permissions does Ditto request?

macOS Accessibility (so shadow cursors can move + click), Microphone (wake word + push-to-talk), Screen Recording (live screen context), Speech Recognition (on-device wake word), Apple Events (open Settings deep-links during onboarding). All standard TCC prompts; revocable any time in System Settings → Privacy & Security.

What happens if I revoke microphone access mid-session?

Ditto detects the revocation and immediately suspends wake-word + push-to-talk capture. The menu-bar status switches to "Mic disabled" and a banner tells you how to re-grant. Sub-agents already in flight finish; no new voice input is captured.

Can I bring my own LLM?

Today: Anthropic Claude (default), OpenAI GPT-4o (optional), with Ollama local-LLM as a future fallback for the orchestrator path. The Ditto voice loop is currently Claude-only because of vision + tool-use stability. Local-LLM voice path is on the v2 roadmap.

Mac only?

For now, yes — macOS 14.2+. Ditto is a native Swift app (~17K LOC) and uses ScreenCaptureKit, AVAudioEngine, on-device Speech, and the macOS Accessibility APIs that aren't portable to other OSes. Windows / Linux versions aren't planned.

Does any audio leave my Mac?

No raw audio. Wake-word detection runs on-device. Once the wake word fires, your post-wake-word transcript text (not audio) goes to Claude's API along with a redacted screen snapshot. ElevenLabs receives only the response text Ditto speaks back. Set TTS to system-voice in Settings if you'd rather keep voice synthesis local too.