ElevenLabs: Instant Voice Cloning From 10 Seconds of Audio
ElevenLabs now supports Instant Voice Cloning from as little as 10 seconds of audio via their API. Upload a clean audio sample, get a voice_id back in under 3 seconds, and immediately use it for text-to-speech generation.
The friction of adding a custom voice to an app just collapsed. 10 seconds of audio is easy to collect from any user, any podcast clip, or any spokesperson recording. This unlocks personalized TTS at scale — onboarding flows, AI assistants, content localization.
POST an audio file to /v1/voices/add with name and files[] parameters. The API returns a voice_id immediately. Pass that voice_id to /v1/text-to-speech/{voice_id} for synthesis. Minimum sample: 10 seconds of clear speech, no background noise.
ElevenLabs instant voice cloning: 10 seconds in, production-ready voice out. The barrier to personalized audio just hit the floor.