mirror of https://git.openapi.site/https://github.com/desirecore/market.git synced 2026-06-06 04:30:42 +08:00

Files

xyx dfd0cb62cf fix(i18n): sync dashscope-image-gen & xiaomi-tts zh-CN body with local (#13 )

## Summary

- 用本地 `defaults/global-skills/` 中的最新中文 body 内容更新远程 `SKILL.zh-CN.md`
- 更新 zh-CN `source_hash` 以反映最新内容
- 解锁 en-US `translated_by`，允许 CI 自动重新翻译英文 body

## 变更文件

- `skills/dashscope-image-gen/SKILL.zh-CN.md` — 中文 body 同步至本地最新版本
- `skills/dashscope-image-gen/SKILL.md` — 更新 source_hash + 解锁 en-US
- `skills/xiaomi-tts/SKILL.zh-CN.md` — 中文 body 同步至本地最新版本
- `skills/xiaomi-tts/SKILL.md` — 更新 source_hash + 解锁 en-US

2026-05-13 19:38:31 +08:00

9.1 KiB

Raw Blame History

name, description, license, version, type, risk_level, status, disable-model-invocation, provider, tags, requires, metadata, market

name

description

license

version

type

risk_level

status

disable-model-invocation

provider

xiaomi-tts Skill

Mandatory Rules (violations cause failure)

Must access agent-service over HTTPS — use https://127.0.0.1:${PORT} with -k to skip certificate verification
Must upload to media-store via /api/media/upload — /tmp is only a transient download/decode location, never use a local path as the final output
Must use the dc-media:// protocol to display audio — the only form the frontend can render correctly
Use Bash curl throughout — do not use the HttpRequest tool or Python
Use the /chat/completions endpoint — Xiaomi MiMo TTS speaks OpenAI-compatible chat format

Model Selection

Model	Characteristics	When to use
mimo-v2.5-tts	Standard TTS, multiple preset voices	Default, regular speech synthesis
mimo-v2.5-tts-voicedesign	Custom voice design	When you need a voice generated from a description
mimo-v2.5-tts-voiceclone	Voice cloning	When you need to clone a specific voice (reference audio required)

Default rule: if the user does not specify a model, use mimo-v2.5-tts.

Voice Selection

Preset Voices

voice_id	Name	Characteristics
default_zh	Default Chinese	General-purpose Chinese female voice
default_en	Default English	General-purpose English female voice
mimo_default	MiMo Default	MiMo's signature voice
Bingtang	Bingtang	Sweet female voice
Moli	Moli	Soft, gentle female voice
Suda	Suda	Young male voice
Baihua	Baihua	Mature male voice
Mia	Mia	English female voice
Chloe	Chloe	English female voice
Milo	Milo	English male voice
Dean	Dean	English male voice

Default rule: use Bingtang for Chinese text and Mia for English text; if the user doesn't specify, pick automatically by content language.

Full Execution Flow (strictly three steps)

Prerequisites

The user has configured a Xiaomi MiMo provider in Resource Manager → Compute and filled in an API Key
agent-service is running

Step 1: Call the TTS API

Generate speech via media-proxy's /chat/completions endpoint.

Important: messages must use the assistant role (not user); the text to synthesize goes in the assistant message's content.

PORT=$(cat ~/.desirecore/agent-service.port)
curl -sk -X POST "https://127.0.0.1:${PORT}/api/media-proxy" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "xiaomi",
    "serviceType": "tts",
    "endpoint": "/chat/completions",
    "body": {
      "model": "mimo-v2.5-tts",
      "messages": [
        {
          "role": "assistant",
          "content": "Replace this with the text to synthesize"
        }
      ],
      "voice": "Bingtang",
      "audio": {"format": "mp3"}
    },
    "responseType": "json"
  }'

Example response:

{
  "success": true,
  "data": {
    "id": "chatcmpl-...",
    "choices": [
      {
        "index": 0,
        "message": {
          "role": "assistant",
          "audio": {
            "data": "base64-encoded audio data...",
            "format": "mp3"
          }
        },
        "finish_reason": "stop"
      }
    ]
  },
  "statusCode": 200
}

Pull the base64-encoded audio data from data.choices[0].message.audio.data.

Step 2: Decode and upload to media-store

The audio comes back as base64; decode it and save to the local media-store.

Recommended approach (write the full response to a file first to avoid overlong shell arguments):

PORT=$(cat ~/.desirecore/agent-service.port)
# Save the full request and response to a file
curl -sk -X POST "https://127.0.0.1:${PORT}/api/media-proxy" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "xiaomi",
    "serviceType": "tts",
    "endpoint": "/chat/completions",
    "body": {
      "model": "mimo-v2.5-tts",
      "messages": [{"role": "assistant", "content": "Text to synthesize"}],
      "voice": "Bingtang",
      "audio": {"format": "mp3"}
    },
    "responseType": "json"
  }' > /tmp/xiaomi-tts-response.json

# Extract and decode the base64 audio data
cat /tmp/xiaomi-tts-response.json | jq -r '.data.choices[0].message.audio.data' | base64 -d > /tmp/xiaomi-tts.mp3

# Upload to media-store
curl -sk -X POST "https://127.0.0.1:${PORT}/api/media/upload" \
  -F "file=@/tmp/xiaomi-tts.mp3;type=audio/mpeg"

Pick the mediaId field from the JSON response (format xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.mp3).

Step 3: Render the audio via the dc-media protocol

In your reply text, write Markdown syntax directly:

![TTS result](dc-media://replace-with-mediaId)

For example: ![TTS: Hello world](dc-media://a1b2c3d4-e5f6-47a8-b9c0-d1e2f3a4b5c6.mp3)

The frontend detects the .mp3 extension and renders an audio player.

Parameter Mapping

Request body parameters (inside `body`)

Parameter	Description	Default
`model`	Model name	"mimo-v2.5-tts"
`messages[0].role`	Must be "assistant"	"assistant" (fixed)
`messages[0].content`	Text to synthesize	required
`voice`	Voice ID	"Bingtang" (Chinese) / "Mia" (English)
`audio.format`	Audio format	"mp3" (also accepts "wav")

User intent mapping

User intent	Parameter
Sweet / cute	voice: "Bingtang"
Gentle / refined	voice: "Moli"
Young male	voice: "Suda"
Mature male	voice: "Baihua"
English female	voice: "Mia" or "Chloe"
English male	voice: "Milo" or "Dean"
High fidelity / lossless	audio.format: "wav"

Error Handling

success: false + error: "No matching provider": Xiaomi MiMo provider not configured or disabled
success: false + error: "API Key not configured": API Key missing
statusCode: 401: API Key invalid or expired
statusCode: 429: rate limited, retry later
statusCode: 400: bad parameters (e.g. unknown voice, empty text)
statusCode: 403: model not activated or insufficient permission

Notes

Calls are synchronous, typically 3–15 seconds depending on text length
Audio is returned as base64, so URL expiry is not a concern, but watch shell argument length on long responses
For long text, split into segments (no more than ~500 chars each), then upload and render each segment
When the user doesn't specify, default to mimo-v2.5-tts + auto-selected voice by language + mp3
Token Plan keys (prefix tp-) use the https://token-plan-cn.xiaomimimo.com/v1 endpoint
Pay-as-you-go keys use the https://api.xiaomimimo.com/v1 endpoint
media-proxy picks the correct endpoint based on configuration; the skill does not need to differentiate

9.1 KiB Raw Blame History Unescape Escape