mirror of https://git.openapi.site/https://github.com/desirecore/market.git synced 2026-06-06 05:50:41 +08:00

Files

xyx 0cb3758669 fix: 补全 dashscope-image-gen 和 xiaomi-tts 的 i18n CI 校验 (#4 )

## 变更说明

修复 dashscope-image-gen 和 xiaomi-tts 的 i18n CI 校验、补全英文翻译，并连带修复其他 stale
skill 的 source_hash 漂移问题。

### dashscope-image-gen / xiaomi-tts（PR 主线）
- `name` 字段从中文改为目录名（CI rule-1 要求 lowercase ASCII + hyphens）。
- 补全 `metadata.i18n` 块：`locales`、`zh-CN` (含 body 指向
SKILL.zh-CN.md)、`en-US`（含 description / body=./SKILL.md）。
- 新增 `SKILL.zh-CN.md`（zh-CN body 文件）。
- **root SKILL.md 改写为英文 body**（与 SKILL.zh-CN.md 内容对应），由本 PR
手工翻译；`default_locale=en-US`、`source_locale=zh-CN`，与 docs/I18N.md
约定一致：root SKILL.md = default_locale body (en-US)、SKILL.zh-CN.md =
source_locale body (zh-CN)。
- 两 locale 锁为 `translated_by: human` + 正确 `source_hash`。
- 内容质量修复：流程标题 "严格按此两步执行" 改为 "严格按此三步执行"；强制规则 2 措辞精确化（/tmp
仅作中转）；xiaomi-tts 用户意图映射表中 `response_format` 改为 `audio.format`
与请求体参数表一致；zh-CN.description 改为纯中文。
- locale header 由 shell 转义残留 `<\!--` 修正为标准 `<!-- locale: zh-CN -->`。

### 连带：6 个 main 上已 stale 的 skill（避免 translate workflow 失败）
- `manage-skills` / `minimax-music-gen` / `minimax-video-gen` /
`skill-creator` / `web-access`：`en-US.source_hash` 重新计算为当前 zh-CN source
实际 hash；`translated_by` 由 `ai:claude-opus-4-7` 改为 `human`
以锁定现有翻译不被自动重译覆盖。
- `markdown`：补正 `en-US.source_hash`（之前是占位 `sha256:0000000000000000`）。
- 这些 skill 的 `en-US` 翻译内容保持不变，仅修正元数据。

### scripts/i18n/translate.py 容错增强
- 413 Payload Too Large 时不再 retry（payload 不会变小，retry 浪费时间）。
- 主循环 catch RuntimeError，把单个 skill 的失败写入 `plan["errors"]` 后继续处理下一个
skill，避免一个大文件 fail 整个 workflow。
- `--check` 模式下 plans 含 errors 也 exit 1（之前仅看 needs_translation，broad
except 会把异常吃掉导致误报通过）。

## Test plan

- [x] `i18n-validate` 通过
- [x] `i18n-translate --check` 显示所有 skill `up-to-date` 或 `human-locked,
skipping`
- [x] CI 上 `validate` / `translate` / `wait-for-copilot-review` 全绿
- [ ] Copilot 评审 conversation 全部 resolve
- [ ] Squash merge

---------

Co-authored-by: yi-ge <a@wyr.me>

2026-05-13 12:57:25 +08:00

9.1 KiB

Raw Blame History

name, description, license, version, type, risk_level, status, disable-model-invocation, provider, tags, requires, metadata, market

name

description

license

version

type

risk_level

status

disable-model-invocation

provider

xiaomi-tts Skill

Mandatory Rules (violations cause failure)

Must access agent-service over HTTPS — use https://127.0.0.1:${PORT} with -k to skip certificate verification
Must upload to media-store via /api/media/upload — /tmp is only a transient download/decode location, never use a local path as the final output
Must use the dc-media:// protocol to display audio — the only form the frontend can render correctly
Use Bash curl throughout — do not use the HttpRequest tool or Python
Use the /chat/completions endpoint — Xiaomi MiMo TTS speaks OpenAI-compatible chat format

Model Selection

Model	Characteristics	When to use
mimo-v2.5-tts	Standard TTS, multiple preset voices	Default, regular speech synthesis
mimo-v2.5-tts-voicedesign	Custom voice design	When you need a voice generated from a description
mimo-v2.5-tts-voiceclone	Voice cloning	When you need to clone a specific voice (reference audio required)

Default rule: if the user does not specify a model, use mimo-v2.5-tts.

Voice Selection

Preset Voices

voice_id	Name	Characteristics
default_zh	Default Chinese	General-purpose Chinese female voice
default_en	Default English	General-purpose English female voice
mimo_default	MiMo Default	MiMo's signature voice
Bingtang	Bingtang	Sweet female voice
Moli	Moli	Soft, gentle female voice
Suda	Suda	Young male voice
Baihua	Baihua	Mature male voice
Mia	Mia	English female voice
Chloe	Chloe	English female voice
Milo	Milo	English male voice
Dean	Dean	English male voice

Default rule: use Bingtang for Chinese text and Mia for English text; if the user doesn't specify, pick automatically by content language.

Full Execution Flow (strictly three steps)

Prerequisites

The user has configured a Xiaomi MiMo provider in Resource Manager → Compute and filled in an API Key
agent-service is running

Step 1: Call the TTS API

Generate speech via media-proxy's /chat/completions endpoint.

Important: messages must use the assistant role (not user); the text to synthesize goes in the assistant message's content.

PORT=$(cat ~/.desirecore/agent-service.port)
curl -sk -X POST "https://127.0.0.1:${PORT}/api/media-proxy" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "xiaomi",
    "serviceType": "tts",
    "endpoint": "/chat/completions",
    "body": {
      "model": "mimo-v2.5-tts",
      "messages": [
        {
          "role": "assistant",
          "content": "Replace this with the text to synthesize"
        }
      ],
      "voice": "Bingtang",
      "audio": {"format": "mp3"}
    },
    "responseType": "json"
  }'

Example response:

{
  "success": true,
  "data": {
    "id": "chatcmpl-...",
    "choices": [
      {
        "index": 0,
        "message": {
          "role": "assistant",
          "audio": {
            "data": "base64-encoded audio data...",
            "format": "mp3"
          }
        },
        "finish_reason": "stop"
      }
    ]
  },
  "statusCode": 200
}

Pull the base64-encoded audio data from data.choices[0].message.audio.data.

Step 2: Decode and upload to media-store

The audio comes back as base64; decode it and save to the local media-store.

Recommended approach (write the full response to a file first to avoid overlong shell arguments):

PORT=$(cat ~/.desirecore/agent-service.port)
# Save the full request and response to a file
curl -sk -X POST "https://127.0.0.1:${PORT}/api/media-proxy" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "xiaomi",
    "serviceType": "tts",
    "endpoint": "/chat/completions",
    "body": {
      "model": "mimo-v2.5-tts",
      "messages": [{"role": "assistant", "content": "Text to synthesize"}],
      "voice": "Bingtang",
      "audio": {"format": "mp3"}
    },
    "responseType": "json"
  }' > /tmp/xiaomi-tts-response.json

# Extract and decode the base64 audio data
cat /tmp/xiaomi-tts-response.json | jq -r '.data.choices[0].message.audio.data' | base64 -d > /tmp/xiaomi-tts.mp3

# Upload to media-store
curl -sk -X POST "https://127.0.0.1:${PORT}/api/media/upload" \
  -F "file=@/tmp/xiaomi-tts.mp3;type=audio/mpeg"

Pick the mediaId field from the JSON response (format xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.mp3).

Step 3: Render the audio via the dc-media protocol

In your reply text, write Markdown syntax directly:

![TTS result](dc-media://replace-with-mediaId)

For example: ![TTS: Hello world](dc-media://a1b2c3d4-e5f6-47a8-b9c0-d1e2f3a4b5c6.mp3)

The frontend detects the .mp3 extension and renders an audio player.

Parameter Mapping

Request body parameters (inside `body`)

Parameter	Description	Default
`model`	Model name	"mimo-v2.5-tts"
`messages[0].role`	Must be "assistant"	"assistant" (fixed)
`messages[0].content`	Text to synthesize	required
`voice`	Voice ID	"Bingtang" (Chinese) / "Mia" (English)
`audio.format`	Audio format	"mp3" (also accepts "wav")

User intent mapping

User intent	Parameter
Sweet / cute	voice: "Bingtang"
Gentle / refined	voice: "Moli"
Young male	voice: "Suda"
Mature male	voice: "Baihua"
English female	voice: "Mia" or "Chloe"
English male	voice: "Milo" or "Dean"
High fidelity / lossless	audio.format: "wav"

Error Handling

success: false + error: "No matching provider": Xiaomi MiMo provider not configured or disabled
success: false + error: "API Key not configured": API Key missing
statusCode: 401: API Key invalid or expired
statusCode: 429: rate limited, retry later
statusCode: 400: bad parameters (e.g. unknown voice, empty text)
statusCode: 403: model not activated or insufficient permission

Notes

Calls are synchronous, typically 3–15 seconds depending on text length
Audio is returned as base64, so URL expiry is not a concern, but watch shell argument length on long responses
For long text, split into segments (no more than ~500 chars each), then upload and render each segment
When the user doesn't specify, default to mimo-v2.5-tts + auto-selected voice by language + mp3
Token Plan keys (prefix tp-) use the https://token-plan-cn.xiaomimimo.com/v1 endpoint
Pay-as-you-go keys use the https://api.xiaomimimo.com/v1 endpoint
media-proxy picks the correct endpoint based on configuration; the skill does not need to differentiate

9.1 KiB Raw Blame History Unescape Escape