AI Generation
Generate video, images, audio, and music through MCP tools powered by leading AI providers.
Supported media types
- Video — Google Veo, Runway Gen-3
- Image — Google Imagen, OpenAI DALL-E, FLUX (coming soon)
- Speech — Google TTS, OpenAI TTS, ElevenLabs
- Music — Suno (coming soon)
Usage via MCP
Ask your agent to generate media and insert it into the timeline:
- "Generate 5 seconds of cinematic B-roll showing a city at sunset"
- "Create a voiceover for this script using a warm female voice"
- "Generate a thumbnail image for this video"
Provider selection
VisionDraft automatically selects the best available provider based on media type, quality, and latency. Configure preferences in Settings → Providers (desktop) or use platform defaults (cloud).
See Provider setup for API key configuration on desktop.