Tokaify APITokaify API
API Reference
AI Model APIAudioDoubao APIseed-tts-1.0

Speech Generation

seed-tts-1.0 Speech Generation API

The model parameter is fixed to seed-tts-1.0.

POST
/v1/audio/speech

Authorization

BearerAuth

AuthorizationBearer <token>

Model relay interface recognition. Request heading: Autoration: Bearer .

In: header

Request Body

application/json

model*string

A speech synthesis model.

Default"seed-tts-1.0"
input*string

Text content to read.

voice*string

Sound, e.g. _FD_PROTEC_0, FD_PROTEC_1. A voice string field. Scope: Non-empty string or verification by business configuration.

Default"zh_female_cancan_mars_bigtts"
format?string

Output audio format, e.g. _FD_PROTEC_0, FD_PROTEC_1.

speed?number

Speed. Speed value fields. Scope: An interface description or backstage configuration.

Response Body

audio/mpeg

curl -X POST "https://api.tokaify.com/v1/audio/speech" \  -H "Content-Type: application/json" \  -d '{    "model": "seed-tts-1.0",    "input": "Enter text that needs to be processed.",    "voice": "zh_female_cancan_mars_bigtts",    "format": "json",    "speed": 1  }'
"string"

Request Parameters

FieldTypeRequiredDefaultDescription
modelstringYesNoneFixed to seed-tts-1.0.
inputstringYesNoneText to synthesize. Based on the Volcano Engine online speech synthesis interface, standard voices generally recommend no more than 1024 bytes per request; for long text or cloned voices, follow the provider configuration.
voicestringNozh_female_cancan_mars_bigttsByteDance voice key, for example zh_female_cancan_mars_bigtts. Actual availability depends on the account authorization.
response_formatstringNopcmOutput audio format mapped to the official audio.encoding. Supports mp3, wav, pcm, and ogg_opus; wav is usually not used for streaming scenarios.
speednumberNo1Speech speed mapped to the official audio.speed_ratio, ranging from [0.2, 3], where 1 means normal speed.

Example Code

curl https://api.tokaify.com/v1/audio/speech \
  -H "Authorization: Bearer $TOKAIFY_API_KEY" \
  -H "Content-Type: application/json" \
  --output speech.mp3 \
  -d '{
    "model": "seed-tts-1.0",
    "input": "Hello, this is a voice clip generated by Seed TTS 1.0.",
    "voice": "zh_female_cancan_mars_bigtts",
    "response_format": "mp3",
    "speed": 1
  }'
import requests

response = requests.post(
    "https://api.tokaify.com/v1/audio/speech",
    headers={"Authorization": "Bearer YOUR_TOKAIFY_API_KEY"},
    json={
        "model": "seed-tts-1.0",
        "input": "Hello, this is a voice clip generated by Seed TTS 1.0.",
        "voice": "zh_female_cancan_mars_bigtts",
        "response_format": "mp3",
        "speed": 1,
    },
)
response.raise_for_status()
with open("speech.mp3", "wb") as file:
    file.write(response.content)
import { writeFile } from "node:fs/promises";

const response = await fetch("https://api.tokaify.com/v1/audio/speech", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${process.env.TOKAIFY_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "seed-tts-1.0",
    input: "Hello, this is a voice clip generated by Seed TTS 1.0.",
    voice: "zh_female_cancan_mars_bigtts",
    response_format: "mp3",
    speed: 1,
  }),
});

const audio = await response.arrayBuffer();
await writeFile("speech.mp3", Buffer.from(audio));

Notes

voice maps to the ByteDance voice key. If omitted, the default voice zh_female_cancan_mars_bigtts is used. To control volume, pitch, emotion, or language, use the official fields volume_ratio, pitch_ratio, emotion, and language; exact support depends on the provider configuration and selected voice.

How is this guide?

Last updated on