AI Generation

Generate video, images, audio, and music through MCP tools powered by leading AI providers.

Supported media types

  • Video — Google Veo, Runway Gen-3
  • Image — Google Imagen, OpenAI DALL-E, FLUX (coming soon)
  • Speech — Google TTS, OpenAI TTS, ElevenLabs
  • Music — Suno (coming soon)

Usage via MCP

Ask your agent to generate media and insert it into the timeline:

  • "Generate 5 seconds of cinematic B-roll showing a city at sunset"
  • "Create a voiceover for this script using a warm female voice"
  • "Generate a thumbnail image for this video"

Provider selection

VisionDraft automatically selects the best available provider based on media type, quality, and latency. Configure preferences in Settings → Providers (desktop) or use platform defaults (cloud).

See Provider setup for API key configuration on desktop.