mirror of https://git.openapi.site/https://github.com/desirecore/market.git synced 2026-06-06 05:50:41 +08:00

Files

xyx a582f0f4f9 fix: 为 #16 路径替换涉及的技能补 per-skill version (#22 )

## 背景 / Background

#16 (4f7037a) 将 16 个 SKILL.md 的 `~/.desirecore` 路径批量替换为
`${DESIRECORE_ROOT}`，但只升了 `manifest.json`，**未升任何 per-skill version**。

客户端按 SKILL.md frontmatter 的 per-skill `version` 做 semver 同步：version
不变即判定「无更新」而永久跳过，导致已升级用户的全局技能正文停留在替换前的旧内容（与线上不同步）。

#16 (4f7037a) bulk-replaced `~/.desirecore` with `${DESIRECORE_ROOT}` in
16 SKILL.md files but only bumped `manifest.json`, leaving every
per-skill `version` untouched. Clients sync by per-skill semver, so an
unchanged version is treated as "no update" and skipped forever —
upgraded users' global skills stay frozen on pre-replacement content.

## 改动 / Changes

- 对 #16 触及且至今仍未升号的 **14 个在册技能** 各 patch +1
- `manifest.json` 1.2.2 → 1.2.3（沿用 #16「内容改动同步升 manifest」的约定）
- 退役技能 `minimax-image-gen` / `minimax-tts`（不在 builtin-skills.json，不下发）跳过
- diff 为纯 version 行，未触动正文

Bumps the 14 in-manifest skills changed by #16 that were never
version-bumped; manifest 1.2.2 → 1.2.3; retired skills skipped.
Version-line-only diff.

2026-06-03 17:46:13 +08:00

9.2 KiB

Raw Permalink Blame History

name, description, license, version, type, risk_level, status, disable-model-invocation, provider, tags, requires, metadata, market

name

description

license

version

type

risk_level

status

disable-model-invocation

provider

xiaomi-tts Skill

Mandatory Rules (violations cause failure)

Must access agent-service over HTTPS — use https://127.0.0.1:${PORT} with -k to skip certificate verification
Must upload to media-store via /api/media/upload — /tmp is only a transient download/decode location, never use a local path as the final output
Must use the dc-media:// protocol to display audio — the only form the frontend can render correctly
Use Bash curl throughout — do not use the HttpRequest tool or Python
Use the /chat/completions endpoint — Xiaomi MiMo TTS speaks OpenAI-compatible chat format

Model Selection

Model	Characteristics	When to use
mimo-v2.5-tts	Standard TTS, multiple preset voices	Default, regular speech synthesis
mimo-v2.5-tts-voicedesign	Custom voice design	When you need a voice generated from a description
mimo-v2.5-tts-voiceclone	Voice cloning	When you need to clone a specific voice (reference audio required)

Default rule: if the user does not specify a model, use mimo-v2.5-tts.

Voice Selection

Preset Voices

voice_id	Name	Characteristics
default_zh	Default Chinese	General-purpose Chinese female voice
default_en	Default English	General-purpose English female voice
mimo_default	MiMo Default	MiMo's signature voice
Bingtang	Bingtang	Sweet female voice
Moli	Moli	Soft, gentle female voice
Suda	Suda	Young male voice
Baihua	Baihua	Mature male voice
Mia	Mia	English female voice
Chloe	Chloe	English female voice
Milo	Milo	English male voice
Dean	Dean	English male voice

Default rule: use Bingtang for Chinese text and Mia for English text; if the user doesn't specify, pick automatically by content language.

Full Execution Flow (strictly three steps)

Prerequisites

The user has configured a Xiaomi MiMo provider in Resource Manager → Compute and filled in an API Key
agent-service is running

Step 1: Call the TTS API

Generate speech via media-proxy's /chat/completions endpoint.

Important: messages must use the assistant role (not user); the text to synthesize goes in the assistant message's content.

PORT=$(cat ${DESIRECORE_ROOT}/agent-service.port)
curl -sk -X POST "https://127.0.0.1:${PORT}/api/media-proxy" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "xiaomi",
    "serviceType": "tts",
    "endpoint": "/chat/completions",
    "body": {
      "model": "mimo-v2.5-tts",
      "messages": [
        {
          "role": "assistant",
          "content": "Replace this with the text to synthesize"
        }
      ],
      "voice": "Bingtang",
      "audio": {"format": "mp3"}
    },
    "responseType": "json"
  }'

Example response:

{
  "success": true,
  "data": {
    "id": "chatcmpl-...",
    "choices": [
      {
        "index": 0,
        "message": {
          "role": "assistant",
          "audio": {
            "data": "base64-encoded audio data...",
            "format": "mp3"
          }
        },
        "finish_reason": "stop"
      }
    ]
  },
  "statusCode": 200
}

Pull the base64-encoded audio data from data.choices[0].message.audio.data.

Step 2: Decode and upload to media-store

The audio comes back as base64; decode it and save to the local media-store.

Recommended approach (write the full response to a file first to avoid overlong shell arguments):

PORT=$(cat ${DESIRECORE_ROOT}/agent-service.port)
# Save the full request and response to a file
curl -sk -X POST "https://127.0.0.1:${PORT}/api/media-proxy" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "xiaomi",
    "serviceType": "tts",
    "endpoint": "/chat/completions",
    "body": {
      "model": "mimo-v2.5-tts",
      "messages": [{"role": "assistant", "content": "Text to synthesize"}],
      "voice": "Bingtang",
      "audio": {"format": "mp3"}
    },
    "responseType": "json"
  }' > /tmp/xiaomi-tts-response.json

# Extract and decode the base64 audio data
cat /tmp/xiaomi-tts-response.json | jq -r '.data.choices[0].message.audio.data' | base64 -d > /tmp/xiaomi-tts.mp3

# Upload to media-store
curl -sk -X POST "https://127.0.0.1:${PORT}/api/media/upload" \
  -F "file=@/tmp/xiaomi-tts.mp3;type=audio/mpeg"

Pick the mediaId field from the JSON response (format xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.mp3).

Step 3: Render the audio via the dc-media protocol

In your reply text, write Markdown syntax directly:

![TTS result](dc-media://replace-with-mediaId)

For example: ![TTS: Hello world](dc-media://a1b2c3d4-e5f6-47a8-b9c0-d1e2f3a4b5c6.mp3)

The frontend detects the .mp3 extension and renders an audio player.

Parameter Mapping

Request body parameters (inside `body`)

Parameter	Description	Default
`model`	Model name	"mimo-v2.5-tts"
`messages[0].role`	Must be "assistant"	"assistant" (fixed)
`messages[0].content`	Text to synthesize	required
`voice`	Voice ID	"Bingtang" (Chinese) / "Mia" (English)
`audio.format`	Audio format	"mp3" (also accepts "wav")

User intent mapping

User intent	Parameter
Sweet / cute	voice: "Bingtang"
Gentle / refined	voice: "Moli"
Young male	voice: "Suda"
Mature male	voice: "Baihua"
English female	voice: "Mia" or "Chloe"
English male	voice: "Milo" or "Dean"
High fidelity / lossless	audio.format: "wav"

Error Handling

success: false + error: "No matching provider": Xiaomi MiMo provider not configured or disabled
success: false + error: "API Key not configured": API Key missing
statusCode: 401: API Key invalid or expired
statusCode: 429: rate limited, retry later
statusCode: 400: bad parameters (e.g. unknown voice, empty text)
statusCode: 403: model not activated or insufficient permission

Notes

Calls are synchronous, typically 3–15 seconds depending on text length
Audio is returned as base64, so URL expiry is not a concern, but watch shell argument length on long responses
For long text, split into segments (no more than ~500 chars each), then upload and render each segment
When the user doesn't specify, default to mimo-v2.5-tts + auto-selected voice by language + mp3
Token Plan keys (prefix tp-) use the https://token-plan-cn.xiaomimimo.com/v1 endpoint
Pay-as-you-go keys use the https://api.xiaomimimo.com/v1 endpoint
media-proxy picks the correct endpoint based on configuration; the skill does not need to differentiate

9.2 KiB Raw Permalink Blame History Unescape Escape