fix: 补全 dashscope-image-gen 和 xiaomi-tts 的 i18n CI 校验 (#4)

## 变更说明

修复 dashscope-image-gen 和 xiaomi-tts 的 i18n CI 校验、补全英文翻译,并连带修复其他 stale
skill 的 source_hash 漂移问题。

### dashscope-image-gen / xiaomi-tts(PR 主线)
- `name` 字段从中文改为目录名(CI rule-1 要求 lowercase ASCII + hyphens)。
- 补全 `metadata.i18n` 块:`locales`、`zh-CN` (含 body 指向
SKILL.zh-CN.md)、`en-US`(含 description / body=./SKILL.md)。
- 新增 `SKILL.zh-CN.md`(zh-CN body 文件)。
- **root SKILL.md 改写为英文 body**(与 SKILL.zh-CN.md 内容对应),由本 PR
手工翻译;`default_locale=en-US`、`source_locale=zh-CN`,与 docs/I18N.md
约定一致:root SKILL.md = default_locale body (en-US)、SKILL.zh-CN.md =
source_locale body (zh-CN)。
- 两 locale 锁为 `translated_by: human` + 正确 `source_hash`。
- 内容质量修复:流程标题 "严格按此两步执行" 改为 "严格按此三步执行";强制规则 2 措辞精确化(/tmp
仅作中转);xiaomi-tts 用户意图映射表中 `response_format` 改为 `audio.format`
与请求体参数表一致;zh-CN.description 改为纯中文。
- locale header 由 shell 转义残留 `<\!--` 修正为标准 `<!-- locale: zh-CN -->`。

### 连带:6 个 main 上已 stale 的 skill(避免 translate workflow 失败)
- `manage-skills` / `minimax-music-gen` / `minimax-video-gen` /
`skill-creator` / `web-access`:`en-US.source_hash` 重新计算为当前 zh-CN source
实际 hash;`translated_by` 由 `ai:claude-opus-4-7` 改为 `human`
以锁定现有翻译不被自动重译覆盖。
- `markdown`:补正 `en-US.source_hash`(之前是占位 `sha256:0000000000000000`)。
- 这些 skill 的 `en-US` 翻译内容保持不变,仅修正元数据。

### scripts/i18n/translate.py 容错增强
- 413 Payload Too Large 时不再 retry(payload 不会变小,retry 浪费时间)。
- 主循环 catch RuntimeError,把单个 skill 的失败写入 `plan["errors"]` 后继续处理下一个
skill,避免一个大文件 fail 整个 workflow。
- `--check` 模式下 plans 含 errors 也 exit 1(之前仅看 needs_translation,broad
except 会把异常吃掉导致误报通过)。

## Test plan

- [x] `i18n-validate` 通过
- [x] `i18n-translate --check` 显示所有 skill `up-to-date` 或 `human-locked,
skipping`
- [x] CI 上 `validate` / `translate` / `wait-for-copilot-review` 全绿
- [ ] Copilot 评审 conversation 全部 resolve
- [ ] Squash merge

---------

Co-authored-by: yi-ge <a@wyr.me>
This commit is contained in:
2026-05-13 12:57:25 +08:00
committed by GitHub
parent b8101406fb
commit 0cb3758669
11 changed files with 562 additions and 163 deletions

View File

@@ -248,6 +248,13 @@ def _post_with_retries(url: str, headers: dict, payload: dict, *, extract) -> st
try: try:
with httpx.Client(timeout=HTTP_TIMEOUT) as client: with httpx.Client(timeout=HTTP_TIMEOUT) as client:
resp = client.post(url, headers=headers, json=payload) resp = client.post(url, headers=headers, json=payload)
# Don't retry on 413: payload won't get smaller on next attempt.
if resp.status_code == 413:
raise RuntimeError(
f"413 Payload Too Large from {url} — skill body too big for this backend. "
f"Switch backend (TRANSLATE_BACKEND=anthropic), use a model with larger input budget, "
f"or set translated_by: human to lock the locale."
)
if resp.status_code == 429 or resp.status_code >= 500: if resp.status_code == 429 or resp.status_code >= 500:
raise httpx.HTTPStatusError(f"{resp.status_code}", request=resp.request, response=resp) raise httpx.HTTPStatusError(f"{resp.status_code}", request=resp.request, response=resp)
resp.raise_for_status() resp.raise_for_status()
@@ -499,11 +506,19 @@ def main(argv: list[str]) -> int:
if not (skill_dir.is_dir() and (skill_dir / "SKILL.md").is_file()): if not (skill_dir.is_dir() and (skill_dir / "SKILL.md").is_file()):
continue continue
for tl in target_locales: for tl in target_locales:
try:
plans.append(translate_skill( plans.append(translate_skill(
skill_dir, tl, skill_dir, tl,
check_only=args.check, mark_human=args.human, check_only=args.check, mark_human=args.human,
backend=backend, model=model, endpoint=endpoint, backend=backend, model=model, endpoint=endpoint,
)) ))
except Exception as e: # don't let one bad skill abort the entire run
plans.append({
"skill": skill_dir.name,
"target": tl,
"actions": [],
"errors": [f"unhandled exception: {e}"],
})
needs = [p for p in plans if p.get("needs_translation")] needs = [p for p in plans if p.get("needs_translation")]
errs = [p for p in plans if p.get("errors")] errs = [p for p in plans if p.get("errors")]
@@ -514,7 +529,7 @@ def main(argv: list[str]) -> int:
for p in errs: for p in errs:
for e in p["errors"]: for e in p["errors"]:
print(f" ERROR [{p['skill']}/{p['target']}]: {e}") print(f" ERROR [{p['skill']}/{p['target']}]: {e}")
return 1 if needs else 0 return 1 if (needs or errs) else 0
print(f"Backend: {backend} Model: {model} Endpoint: {endpoint}\n") print(f"Backend: {backend} Model: {model} Endpoint: {endpoint}\n")
for p in plans: for p in plans:

View File

@@ -29,6 +29,24 @@ metadata:
i18n: i18n:
default_locale: en-US default_locale: en-US
source_locale: zh-CN source_locale: zh-CN
locales:
- zh-CN
- en-US
zh-CN:
name: 阿里云 文生图
short_desc: 基于阿里云通义万相的文本生成图片技能
description: >-
当用户希望使用阿里云 DashScope 的通义万相系列模型生成图片时使用此技能。支持多种模型层级wan2.7-image-pro / wan2.7-image的文生图通过 OpenAI 兼容的 chat/completions API 同步生成图片。用户提到 生成图片、画图、文生图、创建图片、AI 绘画、生成插图、画一张、帮我画、设计图片、通义万相、万相、阿里云画图、dashscope 画图。
body: ./SKILL.zh-CN.md
source_hash: sha256:d24415cd18ebf5d2
translated_by: human
en-US:
name: DashScope Image Generation
short_desc: Text-to-image generation using Alibaba Cloud Wan (通义万相) models
description: "Use this skill when the user wants to generate images using Alibaba Cloud DashScope's Wan (通义万相) series models. Supports text-to-image with multiple model tiers (wan2.7-image-pro, wan2.7-image) via the OpenAI-compatible chat/completions API. Trigger keywords: generate image, draw, text-to-image, create image, AI painting, illustration, design picture, Wan, Tongyi Wanxiang, DashScope."
body: ./SKILL.md
source_hash: sha256:d24415cd18ebf5d2
translated_by: human
market: market:
icon: >- icon: >-
<svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 <svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0
@@ -46,35 +64,35 @@ market:
channel: latest channel: latest
--- ---
# dashscope-image-gen 技能 # dashscope-image-gen Skill
## 强制规则(违反将导致功能失败) ## Mandatory Rules (violations cause failure)
1. **必须用 HTTPS 访问 agent-service**`https://127.0.0.1:${PORT}` `-k` 跳过证书验证 1. **Must access agent-service over HTTPS** use `https://127.0.0.1:${PORT}` with `-k` to skip certificate verification
2. **必须通过 `/api/media/upload` 上传到 media-store** — 禁止保存到本地路径 2. **Must upload to media-store via `/api/media/upload`**`/tmp` is only a transient download/decode location, never use a local path as the final output
3. **必须使用 `dc-media://` 协议展示图片** — 唯一能让前端正确渲染的方式 3. **Must use the `dc-media://` protocol to display images** — the only form the frontend can render correctly
4. **全程使用 Bash curl** — 不要使用 HttpRequest 工具或 Python 4. **Use Bash curl throughout** — do not use the HttpRequest tool or Python
5. **使用 compatible-mode/chat/completions**同步调用,响应直接包含图片 URL 5. **Use compatible-mode (`/chat/completions`)**synchronous call; the response contains the image URL directly
## 模型选择指南 ## Model Selection
| 模型 | 特点 | 适用场景 | | Model | Characteristics | When to use |
|------|------|---------| |------|------|---------|
| wan2.7-image-pro | 旗舰4K 分辨率thinking_mode | 用户要求最高画质、4K、细节丰富 | | wan2.7-image-pro | Flagship, 4K resolution, thinking_mode | User asks for top quality, 4K, or rich detail |
| wan2.7-image | 标准高画质thinking_mode | **默认首选**,无特殊要求时使用 | | wan2.7-image | Standard high quality, thinking_mode | **Default**, for unspecified requests |
**默认规则**:用户未指定模型时,使用 `wan2.7-image` **Default rule**: if the user does not specify a model, use `wan2.7-image`.
## 完整执行流程(严格按此两步执行) ## Full Execution Flow (strictly three steps)
### 前置条件 ### Prerequisites
- 用户已在资源管理器-算力中配置阿里云 DashScope Provider 并填写 API Key - The user has configured an Alibaba Cloud DashScope provider in Resource Manager → Compute and filled in an API Key
- agent-service 正在运行 - agent-service is running
### 第一步:调用文生图 API同步 ### Step 1: Call the text-to-image API (synchronous)
通过 media-proxy compatible-mode 端点生成图片,响应直接包含图片 URL Generate the image via media-proxy's compatible-mode endpoint; the response includes the image URL directly:
```bash ```bash
PORT=$(cat ~/.desirecore/agent-service.port) PORT=$(cat ~/.desirecore/agent-service.port)
@@ -90,7 +108,7 @@ curl -sk -X POST "https://127.0.0.1:${PORT}/api/media-proxy" \
{ {
"role": "user", "role": "user",
"content": [ "content": [
{"type": "text", "text": "这里替换为图片描述(建议英文效果更好)"} {"type": "text", "text": "Replace this with the image description (English usually gives better results)"}
] ]
} }
] ]
@@ -99,7 +117,7 @@ curl -sk -X POST "https://127.0.0.1:${PORT}/api/media-proxy" \
}' }'
``` ```
**响应示例** **Example response**:
```json ```json
{ {
"success": true, "success": true,
@@ -126,39 +144,39 @@ curl -sk -X POST "https://127.0.0.1:${PORT}/api/media-proxy" \
} }
``` ```
`data.output.choices[0].message.content` 中找到 `type: "image"` 的项,提取其 `image` URL Locate the item with `type: "image"` inside `data.output.choices[0].message.content` and extract its `image` URL.
### 第二步:下载并上传到 media-store ### Step 2: Download and upload to media-store
图片 URL 有时效,必须立即下载并保存到本地 media-store The image URL is time-limited; download and persist it to the local media-store immediately:
```bash ```bash
PORT=$(cat ~/.desirecore/agent-service.port) PORT=$(cat ~/.desirecore/agent-service.port)
IMAGE_URL="第一步响应中的 image URL" IMAGE_URL="image URL from step 1's response"
curl -sL "$IMAGE_URL" -o /tmp/dashscope-gen.png && \ curl -sL "$IMAGE_URL" -o /tmp/dashscope-gen.png && \
curl -sk -X POST "https://127.0.0.1:${PORT}/api/media/upload" \ curl -sk -X POST "https://127.0.0.1:${PORT}/api/media/upload" \
-F "file=@/tmp/dashscope-gen.png;type=image/png" -F "file=@/tmp/dashscope-gen.png;type=image/png"
``` ```
从 JSON 响应中提取 `mediaId` 字段(格式如 `xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.png`)。 Pick the `mediaId` field from the JSON response (format `xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.png`).
### 第三步:用 dc-media 协议展示图片 ### Step 3: Render the image via the dc-media protocol
在你的回复文本中直接写 Markdown 图片语法: In your reply text, write Markdown image syntax directly:
``` ```
![图片描述](dc-media://这里替换为mediaId) ![Image description](dc-media://replace-with-mediaId)
``` ```
例如:`![森林中的白色狐狸](dc-media://a1b2c3d4-e5f6-47a8-b9c0-d1e2f3a4b5c6.png)` For example: `![White fox in a forest](dc-media://a1b2c3d4-e5f6-47a8-b9c0-d1e2f3a4b5c6.png)`
前端会自动将 `dc-media://` 转为可访问的图片 URL 并渲染出来。 The frontend will translate `dc-media://` into a reachable image URL and render it.
## 参数映射 ## Parameter Mapping
### 尺寸选择 ### Size selection
通义万相通过 compatible-mode 调用时,尺寸通过 `size` 参数传入(放在请求体顶层): When calling Wan via compatible-mode, the size is passed as the top-level `size` parameter:
```json ```json
{ {
@@ -168,40 +186,40 @@ curl -sk -X POST "https://127.0.0.1:${PORT}/api/media/upload" \
} }
``` ```
| 用户意图 | size 参数 | | User intent | size value |
|---------|-----------| |---------|-----------|
| 正方形/头像/默认 | "1024x1024" | | Square / avatar / default | "1024x1024" |
| 横版/风景/壁纸 | "1792x1024" | | Landscape / scenery / wallpaper | "1792x1024" |
| 竖版/手机/海报 | "1024x1792" | | Portrait / mobile / poster | "1024x1792" |
### 可选参数(加入请求体顶层) ### Optional parameters (top-level body fields)
| 参数 | 说明 | | Parameter | Description |
|------|------| |------|------|
| `n` | 生成数量 1-4默认 1 | | `n` | Number of images, 14, default 1 |
| `size` | 图片尺寸,如 "1024x1024" | | `size` | Image size, e.g. "1024x1024" |
## 多图生成 ## Multiple Image Generation
`n > 1` 时,`choices` 数组会有多个元素,每个 `message.content` 中都有一张图片。需要为每张图片执行下载+上传,然后逐一展示: When `n > 1`, the `choices` array contains multiple entries, each with an image inside `message.content`. Download and upload each image, then render them one by one:
``` ```
![图片1描述](dc-media://mediaId1) ![Image 1 description](dc-media://mediaId1)
![图片2描述](dc-media://mediaId2) ![Image 2 description](dc-media://mediaId2)
``` ```
## 错误处理 ## Error Handling
- `success: false` + `error: "未找到匹配的供应商"`:未配置 DashScope Provider 或未启用 - `success: false` + `error: "No matching provider"`: DashScope provider not configured or disabled
- `success: false` + `error: "未配置 API Key"`:未填写 API Key - `success: false` + `error: "API Key not configured"`: API Key missing
- `statusCode: 401`API Key 无效或已过期 - `statusCode: 401`: API Key invalid or expired
- `statusCode: 429`:频率限制,稍后重试 - `statusCode: 429`: rate limited, retry later
- `statusCode: 400` + `InvalidParameter`:参数错误(如尺寸不支持) - `statusCode: 400` + `InvalidParameter`: bad parameters (e.g. unsupported size)
- `statusCode: 403` + `AccessDenied.Unpurchased`:模型未开通,需要在阿里云控制台开通 - `statusCode: 403` + `AccessDenied.Unpurchased`: model not activated; enable it in the Alibaba Cloud console
## 注意事项 ## Notes
- 通过 compatible-mode 调用是同步的,通常 10-60 秒返回(wan2.7-image-pro 可能更长) - compatible-mode calls are synchronous and typically return in 1060 seconds (wan2.7-image-pro can take longer)
- 结果图片 URL 有时效,必须及时下载 - Image URLs expire; download promptly
- 提示词建议用英文以获得最佳效果,中文也支持 - English prompts usually produce the best results; Chinese is also supported
- 如果用户未明确要求模型/尺寸,默认使用 `wan2.7-image` + `1024x1024` - When the user does not specify a model or size, default to `wan2.7-image` + `1024x1024`

View File

@@ -0,0 +1,161 @@
<!-- locale: zh-CN -->
# dashscope-image-gen 技能
## 强制规则(违反将导致功能失败)
1. **必须用 HTTPS 访问 agent-service**`https://127.0.0.1:${PORT}``-k` 跳过证书验证
2. **必须通过 `/api/media/upload` 上传到 media-store** — /tmp 仅作下载/解码中转,不可直接以本地路径作为最终输出
3. **必须使用 `dc-media://` 协议展示图片** — 唯一能让前端正确渲染的方式
4. **全程使用 Bash curl** — 不要使用 HttpRequest 工具或 Python
5. **使用 compatible-mode/chat/completions** — 同步调用,响应直接包含图片 URL
## 模型选择指南
| 模型 | 特点 | 适用场景 |
|------|------|---------|
| wan2.7-image-pro | 旗舰4K 分辨率thinking_mode | 用户要求最高画质、4K、细节丰富 |
| wan2.7-image | 标准高画质thinking_mode | **默认首选**,无特殊要求时使用 |
**默认规则**:用户未指定模型时,使用 `wan2.7-image`
## 完整执行流程(严格按此三步执行)
### 前置条件
- 用户已在资源管理器-算力中配置阿里云 DashScope Provider 并填写 API Key
- agent-service 正在运行
### 第一步:调用文生图 API同步
通过 media-proxy 的 compatible-mode 端点生成图片,响应直接包含图片 URL
```bash
PORT=$(cat ~/.desirecore/agent-service.port)
curl -sk -X POST "https://127.0.0.1:${PORT}/api/media-proxy" \
-H "Content-Type: application/json" \
-d '{
"provider": "dashscope",
"serviceType": "image_gen",
"endpoint": "/chat/completions",
"body": {
"model": "wan2.7-image",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "这里替换为图片描述(建议英文效果更好)"}
]
}
]
},
"responseType": "json"
}'
```
**响应示例**
```json
{
"success": true,
"data": {
"request_id": "...",
"output": {
"choices": [
{
"message": {
"role": "assistant",
"content": [
{
"type": "image",
"image": "https://dashscope-result.oss.aliyuncs.com/..."
}
]
},
"finish_reason": "stop"
}
]
}
},
"statusCode": 200
}
```
`data.output.choices[0].message.content` 中找到 `type: "image"` 的项,提取其 `image` URL。
### 第二步:下载并上传到 media-store
图片 URL 有时效,必须立即下载并保存到本地 media-store
```bash
PORT=$(cat ~/.desirecore/agent-service.port)
IMAGE_URL="第一步响应中的 image URL"
curl -sL "$IMAGE_URL" -o /tmp/dashscope-gen.png && \
curl -sk -X POST "https://127.0.0.1:${PORT}/api/media/upload" \
-F "file=@/tmp/dashscope-gen.png;type=image/png"
```
从 JSON 响应中提取 `mediaId` 字段(格式如 `xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.png`)。
### 第三步:用 dc-media 协议展示图片
在你的回复文本中直接写 Markdown 图片语法:
```
![图片描述](dc-media://这里替换为mediaId)
```
例如:`![森林中的白色狐狸](dc-media://a1b2c3d4-e5f6-47a8-b9c0-d1e2f3a4b5c6.png)`
前端会自动将 `dc-media://` 转为可访问的图片 URL 并渲染出来。
## 参数映射
### 尺寸选择
通义万相通过 compatible-mode 调用时,尺寸通过 `size` 参数传入(放在请求体顶层):
```json
{
"model": "wan2.7-image",
"size": "1024x1024",
"messages": [...]
}
```
| 用户意图 | size 参数 |
|---------|-----------|
| 正方形/头像/默认 | "1024x1024" |
| 横版/风景/壁纸 | "1792x1024" |
| 竖版/手机/海报 | "1024x1792" |
### 可选参数(加入请求体顶层)
| 参数 | 说明 |
|------|------|
| `n` | 生成数量 1-4默认 1 |
| `size` | 图片尺寸,如 "1024x1024" |
## 多图生成
`n > 1` 时,`choices` 数组会有多个元素,每个 `message.content` 中都有一张图片。需要为每张图片执行下载+上传,然后逐一展示:
```
![图片1描述](dc-media://mediaId1)
![图片2描述](dc-media://mediaId2)
```
## 错误处理
- `success: false` + `error: "未找到匹配的供应商"`:未配置 DashScope Provider 或未启用
- `success: false` + `error: "未配置 API Key"`:未填写 API Key
- `statusCode: 401`API Key 无效或已过期
- `statusCode: 429`:频率限制,稍后重试
- `statusCode: 400` + `InvalidParameter`:参数错误(如尺寸不支持)
- `statusCode: 403` + `AccessDenied.Unpurchased`:模型未开通,需要在阿里云控制台开通
## 注意事项
- 通过 compatible-mode 调用是同步的,通常 10-60 秒返回wan2.7-image-pro 可能更长)
- 结果图片 URL 有时效,必须及时下载
- 提示词建议用英文以获得最佳效果,中文也支持
- 如果用户未明确要求模型/尺寸,默认使用 `wan2.7-image` + `1024x1024`

View File

@@ -38,9 +38,8 @@ metadata:
description: >- description: >-
Manage the Skill lifecycle of an Agent: import, install, update, and delete Skills via HTTP API, or directly author standards-compliant SKILL.md files via the AgentFS filesystem. Use when the user requests to install Skills, import Skills from URL/Git, author new Skills, or manage existing Skills. Manage the Skill lifecycle of an Agent: import, install, update, and delete Skills via HTTP API, or directly author standards-compliant SKILL.md files via the AgentFS filesystem. Use when the user requests to install Skills, import Skills from URL/Git, author new Skills, or manage existing Skills.
body: ./SKILL.md body: ./SKILL.md
source_hash: sha256:7f116cc5de352822 source_hash: sha256:e67016840ba430ae
translated_by: ai:claude-opus-4-7 translated_by: human
translated_at: '2026-05-03'
market: market:
icon: >- icon: >-
<svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 <svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0

View File

@@ -52,7 +52,7 @@ metadata:
particular format. Ensures files are written via Write tool, absolute particular format. Ensures files are written via Write tool, absolute
path is reported, and attachment is sent via SendUserMessage. path is reported, and attachment is sent via SendUserMessage.
body: ./SKILL.md body: ./SKILL.md
source_hash: sha256:0000000000000000 source_hash: sha256:2434b01b42d751c0
translated_by: human translated_by: human
market: market:
icon: >- icon: >-

View File

@@ -45,9 +45,8 @@ metadata:
description: >- description: >-
Use this skill when the user wants to generate music using MiniMax's Music Generation API. Supports text-to-music with lyrics, instrumental generation, and music cover. Use when the user mentions generating music, text-to-music, AI composing, creating songs, writing a song, music generation, AI music, MiniMax music, songwriting, instrumental music, accompaniment, cover, or remake. Use this skill when the user wants to generate music using MiniMax's Music Generation API. Supports text-to-music with lyrics, instrumental generation, and music cover. Use when the user mentions generating music, text-to-music, AI composing, creating songs, writing a song, music generation, AI music, MiniMax music, songwriting, instrumental music, accompaniment, cover, or remake.
body: ./SKILL.md body: ./SKILL.md
source_hash: sha256:403153a9c1da2ad9 source_hash: sha256:f3785e1da2fc5a11
translated_by: ai:claude-opus-4-7 translated_by: human
translated_at: '2026-05-04'
market: market:
icon: >- icon: >-
<svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 <svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0

View File

@@ -45,9 +45,8 @@ metadata:
description: >- description: >-
Use this skill when the user wants to generate videos using MiniMax's Hailuo model. Supports text-to-video, image-to-video, and subject reference. The API is asynchronous — submit a task, poll for status, then download. Use when the user mentions generating videos, text-to-video, AI video, creating videos, video generation, animation generation, MiniMax video, Hailuo, image-to-video. Use this skill when the user wants to generate videos using MiniMax's Hailuo model. Supports text-to-video, image-to-video, and subject reference. The API is asynchronous — submit a task, poll for status, then download. Use when the user mentions generating videos, text-to-video, AI video, creating videos, video generation, animation generation, MiniMax video, Hailuo, image-to-video.
body: ./SKILL.md body: ./SKILL.md
source_hash: sha256:57314c8d07d63585 source_hash: sha256:3b2855b9ff2d0ef1
translated_by: ai:claude-opus-4-7 translated_by: human
translated_at: '2026-05-03'
market: market:
icon: >- icon: >-
<svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 <svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0

View File

@@ -39,9 +39,8 @@ metadata:
description: >- description: >-
Guides users to create and edit standards-compliant SKILL.md skill packages. Supports the DesireCore full format (frontmatter metadata + L0/L1/L2 layered content + scripts/references/assets) and the Claude Code basic format. Use when the user requests to create a new Skill, update an existing Skill, or package experience into a reusable Skill bundle. Guides users to create and edit standards-compliant SKILL.md skill packages. Supports the DesireCore full format (frontmatter metadata + L0/L1/L2 layered content + scripts/references/assets) and the Claude Code basic format. Use when the user requests to create a new Skill, update an existing Skill, or package experience into a reusable Skill bundle.
body: ./SKILL.md body: ./SKILL.md
source_hash: sha256:fa0f3136371f236c source_hash: sha256:2e8b886dc0b77dd1
translated_by: ai:claude-opus-4-7 translated_by: human
translated_at: '2026-05-03'
market: market:
icon: >- icon: >-
<svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 <svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0

View File

@@ -61,9 +61,8 @@ metadata:
short_desc: Web search, page fetching, logged-in browser access via CDP, research workflows short_desc: Web search, page fetching, logged-in browser access via CDP, research workflows
description: A three-layer web-access toolkit — search public pages, fetch heavy pages via Jina Reader, and reach logged-in sites via Chrome CDP. description: A three-layer web-access toolkit — search public pages, fetch heavy pages via Jina Reader, and reach logged-in sites via Chrome CDP.
body: ./SKILL.md body: ./SKILL.md
source_hash: sha256:0ba170b3126a0823 source_hash: sha256:1d044824f5ab31bc
translated_by: ai:claude-opus-4-7 translated_by: human
translated_at: '2026-05-03'
market: market:
icon: >- icon: >-
<svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 <svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0

View File

@@ -30,6 +30,24 @@ metadata:
i18n: i18n:
default_locale: en-US default_locale: en-US
source_locale: zh-CN source_locale: zh-CN
locales:
- zh-CN
- en-US
zh-CN:
name: 小米 MiMo 语音合成
short_desc: 基于小米 MiMo 的文本转语音技能
description: >-
当用户希望使用小米 MiMo 的 TTS 模型mimo-v2.5-tts将文本转为语音时使用此技能。基于 OpenAI 兼容的 chat/completions API响应中携带音频。支持多种预置音色和自定义音色设计。用户提到 语音合成、文字转语音、TTS、朗读、读出来、生成语音、生成音频、文本转音频、配音、念出来、小米语音、MiMo 语音、小米 TTS。
body: ./SKILL.zh-CN.md
source_hash: sha256:2dd06b13152349e5
translated_by: human
en-US:
name: Xiaomi MiMo TTS
short_desc: Text-to-speech synthesis using Xiaomi MiMo models
description: "Use this skill when the user wants to convert text to speech using Xiaomi MiMo's TTS models (mimo-v2.5-tts). Built on the OpenAI-compatible chat/completions API with audio response, supporting multiple preset voices and custom voice design. Trigger keywords: text-to-speech, TTS, read aloud, narrate, generate audio, voice synthesis, MiMo voice, Xiaomi TTS."
body: ./SKILL.md
source_hash: sha256:2dd06b13152349e5
translated_by: human
market: market:
icon: >- icon: >-
<svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 <svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0
@@ -46,58 +64,58 @@ market:
channel: latest channel: latest
--- ---
# xiaomi-tts 技能 # xiaomi-tts Skill
## 强制规则(违反将导致功能失败) ## Mandatory Rules (violations cause failure)
1. **必须用 HTTPS 访问 agent-service**`https://127.0.0.1:${PORT}` `-k` 跳过证书验证 1. **Must access agent-service over HTTPS** use `https://127.0.0.1:${PORT}` with `-k` to skip certificate verification
2. **必须通过 `/api/media/upload` 上传到 media-store** — 禁止保存到本地路径 2. **Must upload to media-store via `/api/media/upload`**`/tmp` is only a transient download/decode location, never use a local path as the final output
3. **必须使用 `dc-media://` 协议展示音频** — 唯一能让前端正确渲染的方式 3. **Must use the `dc-media://` protocol to display audio** — the only form the frontend can render correctly
4. **全程使用 Bash curl** — 不要使用 HttpRequest 工具或 Python 4. **Use Bash curl throughout** — do not use the HttpRequest tool or Python
5. **使用 /chat/completions 端点**小米 MiMo TTS 使用 OpenAI 兼容格式 5. **Use the `/chat/completions` endpoint**Xiaomi MiMo TTS speaks OpenAI-compatible chat format
## 模型选择指南 ## Model Selection
| 模型 | 特点 | 适用场景 | | Model | Characteristics | When to use |
|------|------|---------| |------|------|---------|
| mimo-v2.5-tts | 标准 TTS多种预置音色 | **默认首选**,常规语音合成 | | mimo-v2.5-tts | Standard TTS, multiple preset voices | **Default**, regular speech synthesis |
| mimo-v2.5-tts-voicedesign | 自定义音色设计 | 需要特定音色描述生成 | | mimo-v2.5-tts-voicedesign | Custom voice design | When you need a voice generated from a description |
| mimo-v2.5-tts-voiceclone | 声音克隆 | 需要克隆特定人声(需上传参考音频) | | mimo-v2.5-tts-voiceclone | Voice cloning | When you need to clone a specific voice (reference audio required) |
**默认规则**:用户未指定模型时,使用 `mimo-v2.5-tts` **Default rule**: if the user does not specify a model, use `mimo-v2.5-tts`.
## 音色选择指南 ## Voice Selection
### 预置音色 ### Preset Voices
| voice_id | 名称 | 特点 | | voice_id | Name | Characteristics |
|----------|------|------| |----------|------|------|
| default_zh | 默认中文 | 中文通用女声 | | default_zh | Default Chinese | General-purpose Chinese female voice |
| default_en | 默认英文 | 英文通用女声 | | default_en | Default English | General-purpose English female voice |
| mimo_default | MiMo 默认 | MiMo 特色音色 | | mimo_default | MiMo Default | MiMo's signature voice |
| Bingtang | 冰糖 | 甜美女声 | | Bingtang | Bingtang | Sweet female voice |
| Moli | 茉莉 | 温柔女声 | | Moli | Moli | Soft, gentle female voice |
| Suda | 苏打 | 年轻男声 | | Suda | Suda | Young male voice |
| Baihua | 白桦 | 成熟男声 | | Baihua | Baihua | Mature male voice |
| Mia | Mia | 英文女声 | | Mia | Mia | English female voice |
| Chloe | Chloe | 英文女声 | | Chloe | Chloe | English female voice |
| Milo | Milo | 英文男声 | | Milo | Milo | English male voice |
| Dean | Dean | 英文男声 | | Dean | Dean | English male voice |
**默认规则**:中文内容用 `Bingtang`,英文内容用 `Mia`,用户未指定时按内容语言自动选择。 **Default rule**: use `Bingtang` for Chinese text and `Mia` for English text; if the user doesn't specify, pick automatically by content language.
## 完整执行流程(严格按此三步执行) ## Full Execution Flow (strictly three steps)
### 前置条件 ### Prerequisites
- 用户已在资源管理器-算力中配置小米 MiMo Provider 并填写 API Key - The user has configured a Xiaomi MiMo provider in Resource Manager → Compute and filled in an API Key
- agent-service 正在运行 - agent-service is running
### 第一步:调用 TTS API ### Step 1: Call the TTS API
通过 media-proxy/chat/completions 端点生成语音。 Generate speech via media-proxy's `/chat/completions` endpoint.
**重要**messages 必须使用 `assistant` role(不是 user要合成的文本放在 assistant 消息的 content 中。 **Important**: `messages` must use the `assistant` role (not `user`); the text to synthesize goes in the assistant message's content.
```bash ```bash
PORT=$(cat ~/.desirecore/agent-service.port) PORT=$(cat ~/.desirecore/agent-service.port)
@@ -112,7 +130,7 @@ curl -sk -X POST "https://127.0.0.1:${PORT}/api/media-proxy" \
"messages": [ "messages": [
{ {
"role": "assistant", "role": "assistant",
"content": "这里替换为要合成的文本内容" "content": "Replace this with the text to synthesize"
} }
], ],
"voice": "Bingtang", "voice": "Bingtang",
@@ -122,7 +140,7 @@ curl -sk -X POST "https://127.0.0.1:${PORT}/api/media-proxy" \
}' }'
``` ```
**响应示例** **Example response**:
```json ```json
{ {
"success": true, "success": true,
@@ -134,7 +152,7 @@ curl -sk -X POST "https://127.0.0.1:${PORT}/api/media-proxy" \
"message": { "message": {
"role": "assistant", "role": "assistant",
"audio": { "audio": {
"data": "base64编码的音频数据...", "data": "base64-encoded audio data...",
"format": "mp3" "format": "mp3"
} }
}, },
@@ -146,17 +164,17 @@ curl -sk -X POST "https://127.0.0.1:${PORT}/api/media-proxy" \
} }
``` ```
`data.choices[0].message.audio.data` 提取 base64 编码的音频数据。 Pull the base64-encoded audio data from `data.choices[0].message.audio.data`.
### 第二步:解码并上传到 media-store ### Step 2: Decode and upload to media-store
音频以 base64 返回,需要解码后保存到本地 media-store The audio comes back as base64; decode it and save to the local media-store.
**推荐方式**(先保存完整响应到文件,避免 shell 参数过长): **Recommended approach** (write the full response to a file first to avoid overlong shell arguments):
```bash ```bash
PORT=$(cat ~/.desirecore/agent-service.port) PORT=$(cat ~/.desirecore/agent-service.port)
# 将完整请求和响应保存到文件 # Save the full request and response to a file
curl -sk -X POST "https://127.0.0.1:${PORT}/api/media-proxy" \ curl -sk -X POST "https://127.0.0.1:${PORT}/api/media-proxy" \
-H "Content-Type: application/json" \ -H "Content-Type: application/json" \
-d '{ -d '{
@@ -165,74 +183,74 @@ curl -sk -X POST "https://127.0.0.1:${PORT}/api/media-proxy" \
"endpoint": "/chat/completions", "endpoint": "/chat/completions",
"body": { "body": {
"model": "mimo-v2.5-tts", "model": "mimo-v2.5-tts",
"messages": [{"role": "assistant", "content": "要合成的文本"}], "messages": [{"role": "assistant", "content": "Text to synthesize"}],
"voice": "Bingtang", "voice": "Bingtang",
"audio": {"format": "mp3"} "audio": {"format": "mp3"}
}, },
"responseType": "json" "responseType": "json"
}' > /tmp/xiaomi-tts-response.json }' > /tmp/xiaomi-tts-response.json
# 提取 base64 音频数据并解码 # Extract and decode the base64 audio data
cat /tmp/xiaomi-tts-response.json | jq -r '.data.choices[0].message.audio.data' | base64 -d > /tmp/xiaomi-tts.mp3 cat /tmp/xiaomi-tts-response.json | jq -r '.data.choices[0].message.audio.data' | base64 -d > /tmp/xiaomi-tts.mp3
# 上传到 media-store # Upload to media-store
curl -sk -X POST "https://127.0.0.1:${PORT}/api/media/upload" \ curl -sk -X POST "https://127.0.0.1:${PORT}/api/media/upload" \
-F "file=@/tmp/xiaomi-tts.mp3;type=audio/mpeg" -F "file=@/tmp/xiaomi-tts.mp3;type=audio/mpeg"
``` ```
从 JSON 响应中提取 `mediaId` 字段(格式如 `xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.mp3`)。 Pick the `mediaId` field from the JSON response (format `xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.mp3`).
### 第三步:用 dc-media 协议展示音频 ### Step 3: Render the audio via the dc-media protocol
在你的回复文本中直接写 Markdown 语法: In your reply text, write Markdown syntax directly:
``` ```
![语音合成结果](dc-media://这里替换为mediaId) ![TTS result](dc-media://replace-with-mediaId)
``` ```
例如:`![TTS: 你好世界](dc-media://a1b2c3d4-e5f6-47a8-b9c0-d1e2f3a4b5c6.mp3)` For example: `![TTS: Hello world](dc-media://a1b2c3d4-e5f6-47a8-b9c0-d1e2f3a4b5c6.mp3)`
前端会自动检测 `.mp3` 扩展名并渲染为音频播放器。 The frontend detects the `.mp3` extension and renders an audio player.
## 参数映射 ## Parameter Mapping
### 请求体参数(放在 body 中) ### Request body parameters (inside `body`)
| 参数 | 说明 | 默认值 | | Parameter | Description | Default |
|------|------|--------| |------|------|--------|
| `model` | 模型名称 | "mimo-v2.5-tts" | | `model` | Model name | "mimo-v2.5-tts" |
| `messages[0].role` | **必须为 "assistant"** | "assistant"(固定) | | `messages[0].role` | **Must be "assistant"** | "assistant" (fixed) |
| `messages[0].content` | 要合成的文本 | 必填 | | `messages[0].content` | Text to synthesize | required |
| `voice` | 音色 ID | "Bingtang"(中文)/ "Mia"(英文) | | `voice` | Voice ID | "Bingtang" (Chinese) / "Mia" (English) |
| `audio.format` | 音频格式 | "mp3"(可选 "wav" | | `audio.format` | Audio format | "mp3" (also accepts "wav") |
### 用户意图映射 ### User intent mapping
| 用户意图 | 参数选择 | | User intent | Parameter |
|---------|---------| |---------|---------|
| 甜美/可爱 | voice: "Bingtang" | | Sweet / cute | voice: "Bingtang" |
| 温柔/知性 | voice: "Moli" | | Gentle / refined | voice: "Moli" |
| 年轻男声 | voice: "Suda" | | Young male | voice: "Suda" |
| 成熟男声 | voice: "Baihua" | | Mature male | voice: "Baihua" |
| 英文女声 | voice: "Mia" "Chloe" | | English female | voice: "Mia" or "Chloe" |
| 英文男声 | voice: "Milo" "Dean" | | English male | voice: "Milo" or "Dean" |
| 高音质/无损 | response_format: "wav" | | High fidelity / lossless | audio.format: "wav" |
## 错误处理 ## Error Handling
- `success: false` + `error: "未找到匹配的供应商"`:未配置小米 MiMo Provider 或未启用 - `success: false` + `error: "No matching provider"`: Xiaomi MiMo provider not configured or disabled
- `success: false` + `error: "未配置 API Key"`:未填写 API Key - `success: false` + `error: "API Key not configured"`: API Key missing
- `statusCode: 401`API Key 无效或已过期 - `statusCode: 401`: API Key invalid or expired
- `statusCode: 429`:频率限制,稍后重试 - `statusCode: 429`: rate limited, retry later
- `statusCode: 400`:参数错误(如 voice 不存在、文本为空) - `statusCode: 400`: bad parameters (e.g. unknown voice, empty text)
- `statusCode: 403`:模型未开通或权限不足 - `statusCode: 403`: model not activated or insufficient permission
## 注意事项 ## Notes
- 调用是同步的,通常 3-15 秒返回(视文本长度而定) - Calls are synchronous, typically 315 seconds depending on text length
- 音频以 base64 返回,无外部 URL 时效问题,但数据量较大时注意 shell 参数长度限制 - Audio is returned as base64, so URL expiry is not a concern, but watch shell argument length on long responses
- 长文本建议分段合成(每段不超过 500 字),然后逐段上传展示 - For long text, split into segments (no more than ~500 chars each), then upload and render each segment
- 如果用户未明确要求音色/格式,默认使用 `mimo-v2.5-tts` + 按语言选音色 + `mp3` - When the user doesn't specify, default to `mimo-v2.5-tts` + auto-selected voice by language + `mp3`
- Token Plan 密钥tp- 前缀)使用 `https://token-plan-cn.xiaomimimo.com/v1` 端点 - Token Plan keys (prefix `tp-`) use the `https://token-plan-cn.xiaomimimo.com/v1` endpoint
- 按量付费密钥使用 `https://api.xiaomimimo.com/v1` 端点 - Pay-as-you-go keys use the `https://api.xiaomimimo.com/v1` endpoint
- media-proxy 会自动根据配置选择正确的端点,技能无需区分 - media-proxy picks the correct endpoint based on configuration; the skill does not need to differentiate

View File

@@ -0,0 +1,192 @@
<!-- locale: zh-CN -->
# xiaomi-tts 技能
## 强制规则(违反将导致功能失败)
1. **必须用 HTTPS 访问 agent-service**`https://127.0.0.1:${PORT}``-k` 跳过证书验证
2. **必须通过 `/api/media/upload` 上传到 media-store** — /tmp 仅作下载/解码中转,不可直接以本地路径作为最终输出
3. **必须使用 `dc-media://` 协议展示音频** — 唯一能让前端正确渲染的方式
4. **全程使用 Bash curl** — 不要使用 HttpRequest 工具或 Python
5. **使用 /chat/completions 端点** — 小米 MiMo TTS 使用 OpenAI 兼容格式
## 模型选择指南
| 模型 | 特点 | 适用场景 |
|------|------|---------|
| mimo-v2.5-tts | 标准 TTS多种预置音色 | **默认首选**,常规语音合成 |
| mimo-v2.5-tts-voicedesign | 自定义音色设计 | 需要特定音色描述生成 |
| mimo-v2.5-tts-voiceclone | 声音克隆 | 需要克隆特定人声(需上传参考音频) |
**默认规则**:用户未指定模型时,使用 `mimo-v2.5-tts`
## 音色选择指南
### 预置音色
| voice_id | 名称 | 特点 |
|----------|------|------|
| default_zh | 默认中文 | 中文通用女声 |
| default_en | 默认英文 | 英文通用女声 |
| mimo_default | MiMo 默认 | MiMo 特色音色 |
| Bingtang | 冰糖 | 甜美女声 |
| Moli | 茉莉 | 温柔女声 |
| Suda | 苏打 | 年轻男声 |
| Baihua | 白桦 | 成熟男声 |
| Mia | Mia | 英文女声 |
| Chloe | Chloe | 英文女声 |
| Milo | Milo | 英文男声 |
| Dean | Dean | 英文男声 |
**默认规则**:中文内容用 `Bingtang`,英文内容用 `Mia`,用户未指定时按内容语言自动选择。
## 完整执行流程(严格按此三步执行)
### 前置条件
- 用户已在资源管理器-算力中配置小米 MiMo Provider 并填写 API Key
- agent-service 正在运行
### 第一步:调用 TTS API
通过 media-proxy 的 /chat/completions 端点生成语音。
**重要**messages 必须使用 `assistant` role不是 user要合成的文本放在 assistant 消息的 content 中。
```bash
PORT=$(cat ~/.desirecore/agent-service.port)
curl -sk -X POST "https://127.0.0.1:${PORT}/api/media-proxy" \
-H "Content-Type: application/json" \
-d '{
"provider": "xiaomi",
"serviceType": "tts",
"endpoint": "/chat/completions",
"body": {
"model": "mimo-v2.5-tts",
"messages": [
{
"role": "assistant",
"content": "这里替换为要合成的文本内容"
}
],
"voice": "Bingtang",
"audio": {"format": "mp3"}
},
"responseType": "json"
}'
```
**响应示例**
```json
{
"success": true,
"data": {
"id": "chatcmpl-...",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"audio": {
"data": "base64编码的音频数据...",
"format": "mp3"
}
},
"finish_reason": "stop"
}
]
},
"statusCode": 200
}
```
`data.choices[0].message.audio.data` 提取 base64 编码的音频数据。
### 第二步:解码并上传到 media-store
音频以 base64 返回,需要解码后保存到本地 media-store。
**推荐方式**(先保存完整响应到文件,避免 shell 参数过长):
```bash
PORT=$(cat ~/.desirecore/agent-service.port)
# 将完整请求和响应保存到文件
curl -sk -X POST "https://127.0.0.1:${PORT}/api/media-proxy" \
-H "Content-Type: application/json" \
-d '{
"provider": "xiaomi",
"serviceType": "tts",
"endpoint": "/chat/completions",
"body": {
"model": "mimo-v2.5-tts",
"messages": [{"role": "assistant", "content": "要合成的文本"}],
"voice": "Bingtang",
"audio": {"format": "mp3"}
},
"responseType": "json"
}' > /tmp/xiaomi-tts-response.json
# 提取 base64 音频数据并解码
cat /tmp/xiaomi-tts-response.json | jq -r '.data.choices[0].message.audio.data' | base64 -d > /tmp/xiaomi-tts.mp3
# 上传到 media-store
curl -sk -X POST "https://127.0.0.1:${PORT}/api/media/upload" \
-F "file=@/tmp/xiaomi-tts.mp3;type=audio/mpeg"
```
从 JSON 响应中提取 `mediaId` 字段(格式如 `xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.mp3`)。
### 第三步:用 dc-media 协议展示音频
在你的回复文本中直接写 Markdown 语法:
```
![语音合成结果](dc-media://这里替换为mediaId)
```
例如:`![TTS: 你好世界](dc-media://a1b2c3d4-e5f6-47a8-b9c0-d1e2f3a4b5c6.mp3)`
前端会自动检测 `.mp3` 扩展名并渲染为音频播放器。
## 参数映射
### 请求体参数(放在 body 中)
| 参数 | 说明 | 默认值 |
|------|------|--------|
| `model` | 模型名称 | "mimo-v2.5-tts" |
| `messages[0].role` | **必须为 "assistant"** | "assistant"(固定) |
| `messages[0].content` | 要合成的文本 | 必填 |
| `voice` | 音色 ID | "Bingtang"(中文)/ "Mia"(英文) |
| `audio.format` | 音频格式 | "mp3"(可选 "wav" |
### 用户意图映射
| 用户意图 | 参数选择 |
|---------|---------|
| 甜美/可爱 | voice: "Bingtang" |
| 温柔/知性 | voice: "Moli" |
| 年轻男声 | voice: "Suda" |
| 成熟男声 | voice: "Baihua" |
| 英文女声 | voice: "Mia" 或 "Chloe" |
| 英文男声 | voice: "Milo" 或 "Dean" |
| 高音质/无损 | audio.format: "wav" |
## 错误处理
- `success: false` + `error: "未找到匹配的供应商"`:未配置小米 MiMo Provider 或未启用
- `success: false` + `error: "未配置 API Key"`:未填写 API Key
- `statusCode: 401`API Key 无效或已过期
- `statusCode: 429`:频率限制,稍后重试
- `statusCode: 400`:参数错误(如 voice 不存在、文本为空)
- `statusCode: 403`:模型未开通或权限不足
## 注意事项
- 调用是同步的,通常 3-15 秒返回(视文本长度而定)
- 音频以 base64 返回,无外部 URL 时效问题,但数据量较大时注意 shell 参数长度限制
- 长文本建议分段合成(每段不超过 500 字),然后逐段上传展示
- 如果用户未明确要求音色/格式,默认使用 `mimo-v2.5-tts` + 按语言选音色 + `mp3`
- Token Plan 密钥tp- 前缀)使用 `https://token-plan-cn.xiaomimimo.com/v1` 端点
- 按量付费密钥使用 `https://api.xiaomimimo.com/v1` 端点
- media-proxy 会自动根据配置选择正确的端点,技能无需区分