Speech Generation

The model parameter is fixed to seed-tts-1.0.

Authorization

BearerAuth

AuthorizationBearer <token>

Model relay interface recognition. Request heading: Autoration: Bearer .

In: header

Request Body

application/json

model*string

A speech synthesis model.

Default"seed-tts-1.0"

input*string

Text content to read.

voice*string

Sound, e.g. _FD_PROTEC_0, FD_PROTEC_1. A voice string field. Scope: Non-empty string or verification by business configuration.

Default"zh_female_cancan_mars_bigtts"

format?string

Output audio format, e.g. _FD_PROTEC_0, FD_PROTEC_1.

speed?number

Speed. Speed value fields. Scope: An interface description or backstage configuration.

Response Body

audio/mpeg

curl -X POST "https://api.tokaify.com/v1/audio/speech" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "seed-tts-1.0",
    "input": "Enter text that needs to be processed.",
    "voice": "zh_female_cancan_mars_bigtts",
    "format": "json",
    "speed": 1
  }'

curl -X POST "https://api.tokaify.com/v1/audio/speech" \  -H "Content-Type: application/json" \  -d '{    "model": "seed-tts-1.0",    "input": "Enter text that needs to be processed.",    "voice": "zh_female_cancan_mars_bigtts",    "format": "json",    "speed": 1  }'

"string"

Request Parameters

Field	Type	Required	Default	Description
`model`	string	Yes	None	Fixed to `seed-tts-1.0`.
`input`	string	Yes	None	Text to synthesize. Based on the Volcano Engine online speech synthesis interface, standard voices generally recommend no more than 1024 bytes per request; for long text or cloned voices, follow the provider configuration.
`voice`	string	No	`zh_female_cancan_mars_bigtts`	ByteDance voice key, for example `zh_female_cancan_mars_bigtts`. Actual availability depends on the account authorization.
`response_format`	string	No	`pcm`	Output audio format mapped to the official `audio.encoding`. Supports `mp3`, `wav`, `pcm`, and `ogg_opus`; `wav` is usually not used for streaming scenarios.
`speed`	number	No	`1`	Speech speed mapped to the official `audio.speed_ratio`, ranging from `[0.2, 3]`, where `1` means normal speed.

Example Code

curl https://api.tokaify.com/v1/audio/speech \
  -H "Authorization: Bearer $TOKAIFY_API_KEY" \
  -H "Content-Type: application/json" \
  --output speech.mp3 \
  -d '{
    "model": "seed-tts-1.0",
    "input": "Hello, this is a voice clip generated by Seed TTS 1.0.",
    "voice": "zh_female_cancan_mars_bigtts",
    "response_format": "mp3",
    "speed": 1
  }'

import requests

response = requests.post(
    "https://api.tokaify.com/v1/audio/speech",
    headers={"Authorization": "Bearer YOUR_TOKAIFY_API_KEY"},
    json={
        "model": "seed-tts-1.0",
        "input": "Hello, this is a voice clip generated by Seed TTS 1.0.",
        "voice": "zh_female_cancan_mars_bigtts",
        "response_format": "mp3",
        "speed": 1,
    },
)
response.raise_for_status()
with open("speech.mp3", "wb") as file:
    file.write(response.content)

import { writeFile } from "node:fs/promises";

const response = await fetch("https://api.tokaify.com/v1/audio/speech", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${process.env.TOKAIFY_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "seed-tts-1.0",
    input: "Hello, this is a voice clip generated by Seed TTS 1.0.",
    voice: "zh_female_cancan_mars_bigtts",
    response_format: "mp3",
    speed: 1,
  }),
});

const audio = await response.arrayBuffer();
await writeFile("speech.mp3", Buffer.from(audio));

Notes

voice maps to the ByteDance voice key. If omitted, the default voice zh_female_cancan_mars_bigtts is used. To control volume, pitch, emotion, or language, use the official fields volume_ratio, pitch_ratio, emotion, and language; exact support depends on the provider configuration and selected voice.