You can create a voice in two ways:

  • Create a voice from a description

  • Upload or record your voice

The method you choose depends on whether you want to generate a new voice style or reproduce a specific speaker.


Create a voice from a description

Select Create a voice from a description to generate a voice based on a short written description of the speaker.

This method is useful for quick exploration and testing different voice styles. The system generates several sample voices that you can preview before creating the voice.

Because the voice is generated from text, the quality and accuracy of the result depend on how clearly the description defines the speaker.

If you need a voice based on a specific person or want a highly consistent voice across projects, consider creating a voice using Upload Audio or Record Audio instead.


Writing effective voice descriptions

The description determines how the generated voice will sound. It defines the speaker’s characteristics such as language, accent, personality, and speaking style.

Clear descriptions usually produce better results. Including details such as gender, age range, tone, pacing, or emotional style helps the system generate a voice that matches your intent.

Short descriptions can also work well when you need a neutral or general-purpose voice. For example, a simple prompt like “confident female training instructor” may already produce a suitable result.

Choose the level of detail based on your goal. A distinctive character voice may require more description, while a standard narrator may only need a few key attributes.


How to structure a voice description

For more predictable results, it helps to include a few key elements in your description.

Start by specifying the language and regional accent of the speaker. Then define the gender and approximate age range.

Next, describe the persona or role of the speaker, along with a few words that capture the emotional tone of the voice.

Finally, add one or two short sentences explaining how the voice should sound and be delivered. You can describe the tone, pacing, clarity, or speaking style.

Example:

A native Spanish speaker with a neutral Latin American accent. Female, around 35–45.

Persona: corporate trainer. Tone: confident, supportive, professional.

Warm and clear voice with steady pacing and precise articulation. Speaks in an instructional style that emphasizes key points while remaining approachable.


Description tips

  • Avoid describing audio effects such as reverb, echo, phone, or tape

  • Clearly specify the language and regional accent

  • Focus on describing the speaker and delivery style, rather than technical audio terms


Upload or record your voice

Instant voice cloning is powerful, but the quality of the result depends entirely on the quality of the input.

A clean, natural recording will produce a voice that sounds authentic and stable. A poor recording will limit the outcome, no matter how advanced the model is.


Minimum requirements

To achieve reliable results, make sure the source audio meets the following criteria:

  • 30 to 90 seconds of clean speech

  • No background noise

  • No music

  • No reverb or echo

  • Natural tone of voice, not shouting or whispering

The goal is to capture how the speaker normally sounds in everyday conversation.


Ideal recording setup

For best results, record in a controlled environment:

  • A quiet room with soft surfaces such as curtains, sofas, or carpets

  • A good USB microphone instead of a laptop’s built-in mic

Modern smartphones also have surprisingly good microphones and can produce excellent results when used properly.


Mobile recording

Recording on a mobile device is fine if:

  • The room is quiet

  • There is no echo or background noise

  • The speaker talks clearly and naturally

  • The recording is made in a voice memo app at the highest available quality

A short, clean recording in a quiet space will always outperform a longer recording in a noisy environment. Prioritize clarity over length, and natural delivery over performance.