Files
market/skills/minimax-tts/SKILL.md
xyx 4f7037a6b6 fix: replace hardcoded ~/.desirecore paths with ${DESIRECORE_ROOT} variable (#16)
## Summary

- 将所有技能文件中的硬编码 `~/.desirecore/` 和 `$HOME/.desirecore/` 路径替换为
`${DESIRECORE_ROOT}/` 变量
- 递增 manifest.json version 至 1.2.1

## Why

dev 模式下 `DESIRECORE_HOME=~/.desirecore-dev`,硬编码路径导致技能读取错误的端口文件和目录。主仓库的
`variable-substitutor.ts` 会在运行时将 `${DESIRECORE_ROOT}` 替换为实际根目录。

## Test plan

- [ ] `npm run dev` 启动后触发任意技能,确认端口路径解析为
`~/.desirecore-dev/agent-service.port`
- [ ] prod 模式确认路径为 `~/.desirecore/agent-service.port`

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2026-05-29 15:36:19 +08:00

7.0 KiB

name, description, license, version, type, risk_level, status, disable-model-invocation, provider, tags, requires, metadata, market
name description license version type risk_level status disable-model-invocation provider tags requires metadata market
minimax-tts Use this skill when the user wants to convert text to speech using MiniMax's T2A (Text-to-Audio) API. Supports multiple voice styles, emotional control, and voice cloning. Use when 用户提到 语音合成、文字转语音、TTS、朗读、 读出来、生成语音、生成音频、文本转音频、配音、念出来、MiniMax 语音。 Complete terms in LICENSE.txt 1.2.1 procedural low enabled true minimax
media
audio
tts
speech
minimax
tools
Bash
author updated_at i18n
desirecore 2026-04-25
default_locale source_locale locales zh-CN en-US
en-US zh-CN
zh-CN
en-US
name short_desc description body source_hash translated_by
MiniMax 语音合成 基于 MiniMax Speech-02 的文本转语音技能 Use this skill when the user wants to convert text to speech using MiniMax's T2A (Text-to-Audio) API. Supports multiple voice styles, emotional control, and voice cloning. Use when 用户提到 语音合成、文字转语音、TTS、朗读、 读出来、生成语音、生成音频、文本转音频、配音、念出来、MiniMax 语音。 ./SKILL.zh-CN.md sha256:455a2ee6365958c2 human
name short_desc description body source_hash translated_by translated_at
MiniMax Text-to-Speech Text-to-speech skill powered by MiniMax Speech-02 Use this skill when the user wants to convert text to speech using MiniMax's T2A (Text-to-Audio) API. Supports multiple voice styles, emotional control, and voice cloning. Use when the user mentions text-to-speech, TTS, read aloud, read it out, generate speech, generate audio, text-to-audio, voiceover, narrate it, MiniMax voice. ./SKILL.md sha256:455a2ee6365958c2 human 2026-05-03
icon category maintainer channel listed
<svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none"><rect x="3" y="3" width="18" height="18" rx="3" stroke="#007AFF" stroke-width="1.5" fill="#007AFF" fill-opacity="0.1"/><path d="M8 9v6M11 7v10M14 10v4M17 8v8" stroke="#007AFF" stroke-width="2" stroke-linecap="round"/></svg> media
name verified
DesireCore Official true
latest false

minimax-tts Skill

Mandatory Rules (violations will cause feature failure)

  1. Must access agent-service over HTTPShttps://127.0.0.1:${PORT} with -k to skip certificate verification
  2. Use Bash curl throughout — do not use the HttpRequest tool or Python

Complete Execution Flow

Prerequisites

  • The user has configured a MiniMax Media Provider with an API Key under Resources → Compute
  • agent-service is running

Voice Selection Guide

voice_id Characteristics Use Cases
male-qn-qingse Young male voice Narration, podcasts
female-shaonv Young female voice Audiobooks, dialogue
female-yujie Mature female voice Professional broadcasting
presenter_male Male anchor voice News, formal occasions
presenter_female Female anchor voice News, formal occasions

Generate Speech

MiniMax TTS returns JSON (containing an audio URL or hex data); use "json" for responseType.

PORT=$(cat ${DESIRECORE_ROOT}/agent-service.port)
curl -sk -X POST "https://127.0.0.1:${PORT}/api/media-proxy" \
  -H "Content-Type: application/json" \
  -d '{
    "providerId": "provider-minimax-media-001",
    "endpoint": "/t2a_v2",
    "body": {
      "model": "speech-02-hd",
      "text": "要转换为语音的文本内容",
      "voice_setting": {
        "voice_id": "male-qn-qingse",
        "speed": 1.0,
        "vol": 1.0,
        "pitch": 0
      },
      "audio_setting": {
        "format": "mp3",
        "sample_rate": 32000
      }
    },
    "responseType": "json"
  }'

Response Handling

MiniMax TTS returns JSON which, depending on the request parameters, may contain a URL or hex format:

URL format response (recommended, requires "format": "url" in audio_setting):

{
  "success": true,
  "data": {
    "data": {
      "audio": {
        "audio_url": "https://...",
        "status": 1
      }
    },
    "base_resp": { "status_code": 0, "status_msg": "success" }
  },
  "statusCode": 200
}

Hex format response (default):

{
  "success": true,
  "data": {
    "data": {
      "audio": {
        "data": "hex编码的音频数据...",
        "status": 1
      }
    },
    "extra_info": {
      "audio_length": 12345,
      "audio_sample_rate": 32000,
      "audio_size": 67890
    }
  },
  "statusCode": 200
}

Download and Upload to media-store

Audio URLs have a time limit, so they must be downloaded immediately and saved to the local media-store.

URL format:

PORT=$(cat ${DESIRECORE_ROOT}/agent-service.port)
AUDIO_URL="响应中的audio_url"
curl -sL "$AUDIO_URL" -o /tmp/minimax-tts.mp3 && \
curl -sk -X POST "https://127.0.0.1:${PORT}/api/media/upload" \
  -F "file=@/tmp/minimax-tts.mp3;type=audio/mpeg"

Hex format:

PORT=$(cat ${DESIRECORE_ROOT}/agent-service.port)
HEX_DATA="响应中的hex数据"
echo -n "$HEX_DATA" | xxd -r -p > /tmp/minimax-tts.mp3 && \
curl -sk -X POST "https://127.0.0.1:${PORT}/api/media/upload" \
  -F "file=@/tmp/minimax-tts.mp3;type=audio/mpeg"

Extract the mediaId field from the JSON response.

Display the Result

Reference it in your reply using the dc-media protocol (the frontend will automatically detect the audio extension and render a player):

![语音合成结果](dc-media://这里替换为mediaId)

Parameter Reference

Parameter Description Default
model Model "speech-02-hd" (HD) or "speech-02-turbo" (fast)
text Text to convert Max 10000 characters
voice_setting.voice_id Voice persona "male-qn-qingse"
voice_setting.speed Speaking speed 1.0
voice_setting.vol Volume 1.0
voice_setting.pitch Pitch 0
audio_setting.format Audio format "mp3"
audio_setting.sample_rate Sample rate 32000

Special Syntax

MiniMax TTS supports inserting pause markers in the text:

  • <#0.5#> — pause for 0.5 seconds
  • <#2#> — pause for 2 seconds
  • Valid range: 0.01 ~ 99.99 seconds

Example: "你好<#1#>欢迎来到 DesireCore"

Error Handling

  • success: false + statusCode: 400: empty text or malformed parameters
  • success: false + statusCode: 401: invalid API Key
  • success: false + statusCode: 429: rate limited
  • success: false + error: "未找到匹配的供应商": MiniMax Media Provider not configured

Notes

  • For text exceeding 3000 characters, streaming output is recommended (proxy mode does not yet support streaming)
  • Returned audio_url is valid for 24 hours
  • Unless the user specifies otherwise, default to speech-02-hd + male-qn-qingse + 1.0x speed
  • For long text, split it into segments of no more than 3000 characters each