VOICE-FIRST · MACOS NATIVE

If you can say it, your Mac can do it.

Ditto puts a real-time, all-knowing AI on every screen of your Mac. Watch a YouTube tutorial — Ditto follows along. Read developer docs — Ditto builds it with you. Take a course — Ditto practices alongside you. Like Pokémon's Ditto, it adapts to anything on your screen — controlling what you see, doing whatever you can think of, and acting on whatever rolls off your tongue. Hands-free, voice-first, MCP-connected.

Scroll
Ditto demo — coming soon

Mail agent shadow cursor reading the inbox, drafting a reply, and waiting for explicit approval before sending.

demo-mail-agent.mp4 · screenshot-mail-agent.png

Voice

"Hey Ditto..." and your Mac listens.

On-device wake-word detection — no audio sent to the cloud until you confirm. Ditto transcribes locally, snapshots your live screen, and reasons about the request before any agent moves.

Demo video — coming soon

Wake-word → STT → agent reasoning → first cursor moves. The full voice loop in 30 seconds.

demo-wake-word-flow.mp4
01

Wake word fires locally

"Hey Ditto" detected by an on-device model. Audio never leaves the Mac until the wake word fires.

02

Live screen + transcript captured

Ditto snapshots what's visible (with privacy redaction) and the transcript becomes the agent's task brief.

03

Sub-agents dispatch

A planning agent picks which scoped sub-agents (Mail / Calendar / Browser / Research / Notes / Summary) handle the work. You see them go.

04

Result spoken back

When the agents finish, Ditto synthesizes the result and reads it back. ElevenLabs TTS or system voice — your pick.

Shadow Cursors

See your sub-agents — work.

Every Ditto sub-agent gets its own colored shadow cursor on screen. Not abstractions — actual pointers you can watch click, type, and read across native apps and web tabs. The cursors moving on this page are doing exactly what they'd do on your Mac.

Mail

Reads, drafts, sends, archives in Apple Mail or Gmail.

Calendar

Schedules, reschedules, finds free slots.

Browser

Drives Chrome/Safari via Chrome MCP — forms, scrapes, multi-step flows.

Research

Searches the web, summarizes, cites sources.

Notes

Captures, recalls, files into Ditto's memory.

Summarizer

Condenses long threads, transcripts, docs into briefs.

Ditto demo — coming soon

Multiple shadow cursors moving in parallel across the desktop — one per sub-agent — color-coded by role.

demo-shadow-cursors.mp4 · screenshot-shadow-cursors.png

Ditto Skills

What Ditto does — beyond the tool calls.

Ditto is more than a thin wrapper around MCP servers. Speak intent and Ditto builds workflows, mentors you through unfamiliar apps, runs end-to-end UX tests, and streams structured telemetry — all from your voice.

Workflow

Build workflows on the fly

Speak a multi-step task and Ditto builds a live workflow. Linear → Slack → Calendar in one prompt. Saved for next time.

demo-workflow-builder.mp4
Mentor

"Show me how"

Stuck in Concur, Workday, anything? Ditto takes the cursor, walks each click with annotations, hands control back.

demo-show-me-how.mp4
UX Testing

End-to-end UX testing

Hand Ditto a flow. It clicks, types, navigates, verifies — like a real user. QA, regressions, demo recordings, onboarding paths.

demo-e2e-testing.mp4
Telemetry

Telemetry on tap

Every shadow-cursor action streams structured telemetry to Loki / DataDog / Honeycomb / your webhook.

demo-telemetry-stream.mp4

MCP-Connected Tools

Your apps, your protocol, your data.

Ditto speaks Model Context Protocol natively. Any MCP server you connect — Mail, Calendar, Browser, Files, Contacts, Slack, Linear, GitHub — becomes a tool every sub-agent can call. No bespoke integrations to maintain.

📬

Mail

Apple Mail + Gmail. Sends always behind explicit approval.

📅

Calendar

Google + iCloud. Find slots, schedule, decline.

🌐

Browser

Chrome MCP. Forms, scrapes, multi-step flows.

📁

Files

Finder + Spotlight. Open, move, rename, search.

👤

Contacts

Look up people, draft outreach, cross-reference.

+

Any MCP server

Slack, Linear, GitHub, Notion, Stripe — drop in.

Demo video — coming soon

"Hey Ditto, draft a reply to my last email from Sarah, and find a 30-min slot tomorrow." Watch the Mail and Calendar cursors split the work in real time.

demo-mail-calendar-mcp.mp4

Product Surfaces

See what Ditto is doing — while it works.

Ditto demo — coming soon

Ditto's notes & memory surface — captured snippets, retrievable by voice, filed by tag.

demo-notes-memory.mp4 · screenshot-notes-memory.png
Ditto demo — coming soon

Ditto performing native macOS actions — clicking, typing, scrolling — driven by voice intent.

demo-desktop-execution.mp4 · screenshot-desktop-execution.png
Demo video — coming soon

Browser cursor scraping product specs across multiple tabs, condensing into a doc.

demo-browser-research.mp4

License + Pricing

One license. Bring your own keys.

Ditto is sold as a desktop license. You pay a flat monthly rate for the app + sub-agent runtime; you bring your own Anthropic / OpenAI / ElevenLabs keys for model calls.

Personal
$29 / month

For one operator on one Mac.

  • Wake-word + push-to-talk
  • All shadow-cursor agents
  • Built-in MCP tools
  • Local-only memory + notes
  • Lifetime updates
Get Ditto Personal
Team
$59 / seat / month

Workspaces, approval policy, audit trail.

  • Everything in Personal
  • Shared workflow library
  • Approval queues + RBAC
  • Audit log + redaction policy
  • Runner registry
Talk to us

Pricing is a placeholder; subject to change before public launch. Contact for enterprise.

FAQ

Honest answers — before you install.

Does Ditto need my Anthropic API key?

Yes. Ditto is bring-your-own-keys for model calls (Anthropic Claude, optionally OpenAI, ElevenLabs). Paste them in Settings on first run; they're stored in macOS Keychain and never leave your device except as the request body to those providers.

What permissions does Ditto request?

macOS Accessibility (so shadow cursors can move + click), Microphone (wake word + push-to-talk), Screen Recording (live screen context), Speech Recognition (on-device wake word), Apple Events (open Settings deep-links during onboarding). All standard TCC prompts; revocable any time in System Settings → Privacy & Security.

What happens if I revoke microphone access mid-session?

Ditto detects the revocation and immediately suspends wake-word + push-to-talk capture. The menu-bar status switches to "Mic disabled" and a banner tells you how to re-grant. Sub-agents already in flight finish; no new voice input is captured.

Can I bring my own LLM?

Today: Anthropic Claude (default), OpenAI GPT-4o (optional), with Ollama local-LLM as a future fallback for the orchestrator path. The Ditto voice loop is currently Claude-only because of vision + tool-use stability. Local-LLM voice path is on the v2 roadmap.

Mac only?

For now, yes — macOS 14.2+. Ditto is a native Swift app (~17K LOC) and uses ScreenCaptureKit, AVAudioEngine, on-device Speech, and the macOS Accessibility APIs that aren't portable to other OSes. Windows / Linux versions aren't planned.

Does any audio leave my Mac?

No raw audio. Wake-word detection runs on-device. Once the wake word fires, your post-wake-word transcript text (not audio) goes to Claude's API along with a redacted screen snapshot. ElevenLabs receives only the response text Ditto speaks back. Set TTS to system-voice in Settings if you'd rather keep voice synthesis local too.