Speech Generation
seed-tts-1.0 Speech Generation API
The model parameter is fixed to seed-tts-1.0.
Authorization
BearerAuth
Model relay interface recognition. Request heading: Autoration: Bearer .
In: header
Request Body
application/json
A speech synthesis model.
"seed-tts-1.0"Text content to read.
Sound, e.g. _FD_PROTEC_0, FD_PROTEC_1. A voice string field. Scope: Non-empty string or verification by business configuration.
"zh_female_cancan_mars_bigtts"Output audio format, e.g. _FD_PROTEC_0, FD_PROTEC_1.
Speed. Speed value fields. Scope: An interface description or backstage configuration.
Response Body
audio/mpeg
curl -X POST "https://api.tokaify.com/v1/audio/speech" \ -H "Content-Type: application/json" \ -d '{ "model": "seed-tts-1.0", "input": "Enter text that needs to be processed.", "voice": "zh_female_cancan_mars_bigtts", "format": "json", "speed": 1 }'"string"Request Parameters
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
model | string | Yes | None | Fixed to seed-tts-1.0. |
input | string | Yes | None | Text to synthesize. Based on the Volcano Engine online speech synthesis interface, standard voices generally recommend no more than 1024 bytes per request; for long text or cloned voices, follow the provider configuration. |
voice | string | No | zh_female_cancan_mars_bigtts | ByteDance voice key, for example zh_female_cancan_mars_bigtts. Actual availability depends on the account authorization. |
response_format | string | No | pcm | Output audio format mapped to the official audio.encoding. Supports mp3, wav, pcm, and ogg_opus; wav is usually not used for streaming scenarios. |
speed | number | No | 1 | Speech speed mapped to the official audio.speed_ratio, ranging from [0.2, 3], where 1 means normal speed. |
Example Code
curl https://api.tokaify.com/v1/audio/speech \
-H "Authorization: Bearer $TOKAIFY_API_KEY" \
-H "Content-Type: application/json" \
--output speech.mp3 \
-d '{
"model": "seed-tts-1.0",
"input": "Hello, this is a voice clip generated by Seed TTS 1.0.",
"voice": "zh_female_cancan_mars_bigtts",
"response_format": "mp3",
"speed": 1
}'import requests
response = requests.post(
"https://api.tokaify.com/v1/audio/speech",
headers={"Authorization": "Bearer YOUR_TOKAIFY_API_KEY"},
json={
"model": "seed-tts-1.0",
"input": "Hello, this is a voice clip generated by Seed TTS 1.0.",
"voice": "zh_female_cancan_mars_bigtts",
"response_format": "mp3",
"speed": 1,
},
)
response.raise_for_status()
with open("speech.mp3", "wb") as file:
file.write(response.content)import { writeFile } from "node:fs/promises";
const response = await fetch("https://api.tokaify.com/v1/audio/speech", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.TOKAIFY_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "seed-tts-1.0",
input: "Hello, this is a voice clip generated by Seed TTS 1.0.",
voice: "zh_female_cancan_mars_bigtts",
response_format: "mp3",
speed: 1,
}),
});
const audio = await response.arrayBuffer();
await writeFile("speech.mp3", Buffer.from(audio));Notes
voice maps to the ByteDance voice key. If omitted, the default voice zh_female_cancan_mars_bigtts is used. To control volume, pitch, emotion, or language, use the official fields volume_ratio, pitch_ratio, emotion, and language; exact support depends on the provider configuration and selected voice.
How is this guide?
Last updated on