feat: skills i18n 改造(schemaVersion 1.1,零向后兼容) (#1)

* feat: skills i18n 改造 — schemaVersion 1.1,零向后兼容

把 21 个 skills + 1 个 agent + manifest/categories 全量迁移到 schemaVersion 1.1
的 i18n 结构,配套 CI AI 翻译流水线(GitHub Models)与本地工具链。

## 关键变更

### 数据结构(破坏性,schemaVersion 1.0 → 1.1)
- SKILL.md: 顶层 name 改为 ASCII slug(== 目录名,符合 agentskills.io 规范);
  中文显示名/short_desc/description 全部迁入 metadata.i18n.<locale>
- agents/<id>/agent.json: shortDesc/fullDesc/tags/persona.{role,traits} 迁入
  i18n.<locale>;changelog[].changes 改为 { <locale>: string[] } 对象
- categories.json: 每个分类的 label/description 迁入 i18n.<locale>,顶层只剩
  color/icon
- manifest.json: 加 supportedLocales / defaultLocale;顶层 description 迁入
  i18n.<locale>

### Body 文件结构
- 根 SKILL.md = frontmatter + default_locale (en-US) body
- SKILL.<locale>.md = 各 locale 的 markdown body(首行 <!-- locale: xx --> 自校验)

### 工具链(scripts/i18n/)
- glossary.json: zh→en 术语表 + do_not_translate 白名单
- schema/skill-frontmatter.schema.json: i18n frontmatter JSON Schema
- validate-i18n.py: 8 条校验规则(name 合规 / locale 完整性 / hash 一致性等)
- translate.py: GitHub Models / Anthropic 双 backend,sha256 增量翻译
- migrate.py: 一次性迁移脚本(旧格式 → i18n 结构)

### CI(.github/workflows/)
- i18n-validate.yml: PR 触发跑 validate + translate --check
- i18n-translate.yml: PR 触发用 GitHub Models(默认 openai/gpt-5-mini)翻译缺失
  locale,自动追加 commit;可切到 ANTHROPIC_API_KEY 走 Claude

### 文档
- docs/I18N.md: 作者贡献指南(schema 说明 / 提交流程 / 常见问题)
- README.md: 加多语言段落

## 验证

- uv run scripts/i18n/validate-i18n.py: OK,49 文件 0 错误
- uv run scripts/i18n/translate.py --check: 0 stale locale
- 21 skills 标题数 zh-CN == en-US 严格对齐(最大 66=66)
- skills-ref 规范校验:全部通过(顶层 name ASCII slug + description 单字段)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(i18n): 修复 PR #1 review 反馈的 6 项问题

- schema: translated_by 正则放宽为 ^(human|ai:[A-Za-z0-9._:/-]+)$,接受
  'ai:github:openai/gpt-5-mini' 这类 backend:model 形式(CI 翻译输出格式)
- README + docs/I18N.md: 修正"CI 用 Claude API"误导描述,正确说明默认是
  GitHub Models(openai/gpt-5-mini)+ GITHUB_TOKEN,可选切到 Anthropic
- skills/minimax-tts/SKILL.md & SKILL.zh-CN.md: 删除多余的 ``` 闭合,避免
  Markdown 后续渲染错乱
- skills/docx/SKILL.md: 翻译时丢失的 • Unicode escape 示例已恢复,
  与 zh-CN 版本对齐

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-05 00:26:33 +08:00
committed by GitHub
parent 1c107a9344
commit 1f7c8b9673
59 changed files with 10533 additions and 2014 deletions

227
scripts/i18n/glossary.json Normal file
View File

@@ -0,0 +1,227 @@
{
"$schema": "./schema/glossary.schema.json",
"_comment": "DesireCore Market i18n glossary. Used by translate.py to keep terminology consistent across skills. Add domain-specific entries as new skills are added.",
"version": "1.0.0",
"default_source_locale": "zh-CN",
"default_target_locale": "en-US",
"terms": {
"zh-CN_to_en-US": {
"智能体": "Agent",
"技能": "Skill",
"工具": "Tool",
"市场": "Market",
"工作流": "workflow",
"黄金样本": "gold sample",
"触发词": "trigger keyword",
"落地页": "landing page",
"仪表盘": "Dashboard",
"海报": "poster",
"登录态": "logged-in session",
"联网访问": "Web Access",
"网页抓取": "page fetching",
"网络搜索": "web search",
"联网搜索": "web search",
"调研": "research",
"公文": "official document",
"备忘录": "memo",
"合同": "contract",
"信函": "letter",
"信函模板": "letter template",
"报告": "report",
"发票": "invoice",
"目录": "table of contents",
"页眉页脚": "header and footer",
"页码": "page number",
"脚注": "footnote",
"尾注": "endnote",
"修订": "tracked changes",
"批注": "comment",
"签名": "signature",
"表单": "form",
"字段": "field",
"电子表格": "spreadsheet",
"工作表": "worksheet",
"数据透视表": "pivot table",
"图表": "chart",
"幻灯片": "slide",
"演示文稿": "presentation",
"幻灯片母版": "slide master",
"环境配置": "environment setup",
"运行环境": "runtime",
"容器": "container",
"沙箱": "sandbox",
"镜像": "image",
"权限": "permission",
"白名单": "allowlist",
"黑名单": "blocklist",
"凭据": "credential",
"密钥": "secret",
"存储桶": "bucket",
"对象存储": "object storage",
"上传": "upload",
"下载": "download",
"拷贝": "copy",
"迁移": "migration",
"校验": "validation",
"回退": "fallback",
"回滚": "rollback",
"审批": "approval",
"审计": "audit",
"灰度": "canary release",
"发布": "release",
"通道": "channel",
"稳定通道": "stable channel",
"金丝雀通道": "canary channel",
"维护者": "maintainer",
"贡献者": "contributor",
"作者": "author",
"风险等级": "risk level",
"图片生成": "image generation",
"语音合成": "text-to-speech",
"视频生成": "video generation",
"邮箱操作": "Email Operations",
"邮件": "email",
"收件箱": "inbox",
"草稿": "draft",
"标签": "label",
"分类": "category",
"附件": "attachment",
"自动回复": "auto-reply",
"群发": "bulk send",
"智能回复": "smart reply",
"团队管理": "team management",
"技能管理": "skills management",
"调度": "orchestration",
"中枢": "central",
"中枢调度器": "central orchestrator",
"数字员工": "digital worker",
"人格": "persona",
"原则": "principle",
"持久化": "persistence",
"系统中枢": "system core",
"节流": "throttling",
"限流": "rate limiting",
"重试": "retry",
"幂等": "idempotency",
"命令行": "CLI",
"脚本": "script",
"二进制": "binary",
"依赖": "dependency",
"包管理器": "package manager",
"镜像源": "registry mirror",
"节省 Token": "save tokens",
"节省成本": "save cost",
"安装": "install",
"卸载": "uninstall",
"升级": "upgrade",
"降级": "downgrade",
"热更新": "hot-reload",
"冷启动": "cold start",
"调试": "debug",
"远程调试": "remote debugging",
"断点": "breakpoint",
"排查": "troubleshoot",
"技能创建器": "skill creator",
"技能创作": "skill authoring",
"技能模板": "skill template",
"前端设计": "Frontend Design",
"网页设计": "web design",
"UI 设计": "UI design",
"界面设计": "interface design",
"组件": "component",
"美化": "polish",
"样式": "style",
"主题": "theme",
"配色": "color scheme",
"字体": "font",
"字号": "font size",
"栅格": "grid",
"暗色模式": "dark mode",
"亮色模式": "light mode",
"无障碍": "accessibility",
"响应式": "responsive",
"动效": "motion / animation",
"交互": "interaction",
"用户体验": "UX",
"用户界面": "UI",
"渲染": "render",
"构建": "build",
"打包": "bundle",
"热重载": "hot reload"
}
},
"do_not_translate": [
"anthropic",
"claude",
"DesireCore",
"AgentFS",
"MCP",
"Claude Code",
"Anthropic",
"Markdown",
"YAML",
"JSON",
"SVG",
"PDF",
"DOCX",
"PPTX",
"XLSX",
"S3",
"MinIO",
"MiniMax",
"OpenAI",
"Kling",
"Jina",
"Reader",
"Playwright",
"CDP",
"Chrome",
"GitHub",
"Gmail",
"Outlook",
"IMAP",
"SMTP",
"WSL",
"WSL2",
"Docker",
"Podman",
"Kubernetes",
"Node.js",
"Python",
"TypeScript",
"JavaScript",
"Bash",
"BeautifulSoup",
"LibreOffice",
"Poppler",
"Pandoc",
"Tesseract",
"Unicode",
"UTF-8",
"REST",
"API",
"URL",
"HTTP",
"HTTPS",
"OAuth",
"JWT",
"Lucide",
"React",
"Vue",
"Next.js",
"Tailwind",
"CSS",
"HTML",
"SKILL.md",
"frontmatter",
"metadata"
],
"preserve_patterns": [
"^```[\\s\\S]*?```$",
"`[^`\\n]+`",
"https?://[^\\s)]+",
"<[a-zA-Z][\\s\\S]*?>",
"\\$[A-Z_][A-Z0-9_]*",
"\\b[a-z][a-zA-Z0-9_]*\\.[a-z][a-zA-Z0-9_]*\\b"
]
}

235
scripts/i18n/migrate.py Executable file
View File

@@ -0,0 +1,235 @@
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.10"
# dependencies = ["ruamel.yaml>=0.18"]
# ///
"""One-shot migration: convert legacy SKILL.md frontmatter to i18n format.
For each skill directory:
1. Read SKILL.md frontmatter (legacy format with Chinese top-level name).
2. Move legacy `name` -> metadata.i18n.<source>.name (default source: zh-CN).
3. Move legacy `market.short_desc` -> metadata.i18n.<source>.short_desc.
4. Set top-level `name` to the directory name (ASCII slug).
5. Add metadata.i18n.{default_locale=en-US, source_locale=<src>, locales=[src]}.
6. Move existing body to SKILL.<source>.md (with `<!-- locale: <src> -->` header).
7. Replace root SKILL.md body with a translation-pending placeholder.
8. Compute source_hash for the source locale.
DOES NOT TRANSLATE — translate.py (the CI script) fills in the en-US body & i18n block
afterwards. After migration, the skill is structurally valid but only has source_locale
content; en-US locale is added by translate.py on the next CI run.
Usage:
scripts/i18n/migrate.py --dry-run # preview, default
scripts/i18n/migrate.py --apply # write changes
scripts/i18n/migrate.py --apply skills/web-access # one skill
scripts/i18n/migrate.py --apply --source zh-CN # set source locale (default zh-CN)
"""
from __future__ import annotations
import argparse
import hashlib
import json
import re
import sys
from datetime import date
from io import StringIO
from pathlib import Path
from typing import Any
from ruamel.yaml import YAML
from ruamel.yaml.scalarstring import LiteralScalarString, FoldedScalarString
REPO_ROOT = Path(__file__).resolve().parents[2]
DEFAULT_LOCALE = "en-US"
PLACEHOLDER_BODY = (
"<!-- TRANSLATION PENDING: this body will be auto-translated from "
"metadata.i18n.<source_locale>.body by scripts/i18n/translate.py on the next CI run. -->\n"
"\n"
"# {dir_name}\n"
"\n"
"_Translation pending. See `{source_body}` for the source-language version._\n"
)
FRONTMATTER_RE = re.compile(r"^---\s*\n(.*?)\n---\s*\n(.*)$", re.DOTALL)
def make_yaml() -> YAML:
y = YAML()
y.indent(mapping=2, sequence=4, offset=2)
y.width = 4096
y.preserve_quotes = True
return y
def load_frontmatter(text: str) -> tuple[Any, str]:
m = FRONTMATTER_RE.match(text)
if not m:
raise ValueError("File is missing YAML frontmatter")
yaml_text, body = m.group(1), m.group(2)
yaml = make_yaml()
fm = yaml.load(yaml_text)
return fm, body
def dump_frontmatter(fm: Any, body: str) -> str:
yaml = make_yaml()
buf = StringIO()
yaml.dump(fm, buf)
return f"---\n{buf.getvalue()}---\n\n{body.lstrip()}"
def source_hash(body: str, i18n_strings: dict[str, str]) -> str:
h = hashlib.sha256()
h.update(body.encode("utf-8"))
h.update(b"\x00")
h.update(json.dumps(i18n_strings, sort_keys=True, ensure_ascii=False).encode("utf-8"))
return f"sha256:{h.hexdigest()[:16]}"
def migrate_skill(skill_dir: Path, source_locale: str, default_locale: str, apply: bool) -> dict[str, Any]:
"""Return a dict describing the planned changes for this skill."""
rel = skill_dir.relative_to(REPO_ROOT).as_posix()
skill_md = skill_dir / "SKILL.md"
plan: dict[str, Any] = {"skill": rel, "actions": [], "errors": []}
if not skill_md.is_file():
plan["errors"].append("SKILL.md not found")
return plan
text = skill_md.read_text(encoding="utf-8")
try:
fm, body = load_frontmatter(text)
except ValueError as e:
plan["errors"].append(str(e))
return plan
# Already migrated?
metadata = fm.get("metadata") or {}
if isinstance(metadata, dict) and "i18n" in metadata and isinstance(metadata.get("i18n"), dict):
plan["actions"].append("already migrated, skipping")
return plan
legacy_name = fm.get("name", "").strip()
legacy_short_desc = ""
market = fm.get("market")
if isinstance(market, dict):
legacy_short_desc = (market.get("short_desc") or "").strip()
# Remove short_desc from market — it has migrated.
if "short_desc" in market:
market.pop("short_desc", None)
legacy_description = fm.get("description", "")
# New top-level name = directory name
new_name = skill_dir.name
fm["name"] = new_name
# Build metadata.i18n
if not isinstance(metadata, dict):
metadata = {}
fm["metadata"] = metadata
i18n_block: dict[str, Any] = {
"default_locale": default_locale,
"source_locale": source_locale,
"locales": [source_locale],
}
# Source body file
source_body_filename = f"SKILL.{source_locale}.md"
source_body_path = skill_dir / source_body_filename
source_body_text = f"<!-- locale: {source_locale} -->\n\n{body.lstrip()}"
# Source locale strings
src_strings = {
"name": legacy_name or new_name,
"short_desc": legacy_short_desc or legacy_name or new_name,
}
if legacy_description:
src_strings["description"] = (
legacy_description if isinstance(legacy_description, str) else str(legacy_description)
)
src_hash = source_hash(body, src_strings)
src_block: dict[str, Any] = {
"name": src_strings["name"],
"short_desc": src_strings["short_desc"],
}
if "description" in src_strings:
# Use folded style for long descriptions to keep frontmatter readable
desc = src_strings["description"]
src_block["description"] = FoldedScalarString(desc) if "\n" in desc or len(desc) > 80 else desc
src_block["body"] = f"./{source_body_filename}"
src_block["source_hash"] = src_hash
src_block["translated_by"] = "human"
i18n_block[source_locale] = src_block
metadata["i18n"] = i18n_block
# Plan actions
plan["actions"].append(f"rename top-level name '{legacy_name}' -> '{new_name}'")
plan["actions"].append(f"add metadata.i18n.{source_locale} (name, short_desc, description, body, source_hash)")
plan["actions"].append(f"create {source_body_filename} ({len(body)} chars)")
placeholder = PLACEHOLDER_BODY.format(dir_name=new_name, source_body=source_body_filename)
plan["actions"].append(f"replace root SKILL.md body with translation-pending placeholder ({len(placeholder)} chars)")
if legacy_short_desc:
plan["actions"].append("remove market.short_desc (migrated to i18n)")
if apply:
# Write source body file
source_body_path.write_text(source_body_text, encoding="utf-8")
# Write new root SKILL.md (frontmatter + placeholder body)
new_root = dump_frontmatter(fm, placeholder)
skill_md.write_text(new_root, encoding="utf-8")
plan["written"] = [
source_body_path.relative_to(REPO_ROOT).as_posix(),
skill_md.relative_to(REPO_ROOT).as_posix(),
]
return plan
def main(argv: list[str]) -> int:
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument("paths", nargs="*", help="Skill directories to migrate (default: all under skills/)")
parser.add_argument("--apply", action="store_true", help="Write changes (default: dry-run)")
parser.add_argument("--dry-run", action="store_true", help="Preview only (default)")
parser.add_argument("--source", default="zh-CN", help="Source locale (default: zh-CN)")
parser.add_argument("--default", dest="default_locale", default=DEFAULT_LOCALE,
help=f"Default locale (default: {DEFAULT_LOCALE})")
args = parser.parse_args(argv)
apply = args.apply and not args.dry_run
if args.paths:
targets = [Path(p).resolve() for p in args.paths]
else:
targets = sorted((REPO_ROOT / "skills").iterdir())
targets = [t for t in targets if t.is_dir() and (t / "SKILL.md").is_file()]
plans: list[dict[str, Any]] = []
for skill_dir in targets:
if not skill_dir.is_dir() or not (skill_dir / "SKILL.md").is_file():
continue
plans.append(migrate_skill(skill_dir, args.source, args.default_locale, apply))
print(f"\n{'APPLIED' if apply else 'DRY-RUN'} migration plan ({len(plans)} skills):\n")
for p in plans:
if p.get("errors"):
print(f"{p['skill']}: ERRORS: {p['errors']}")
else:
print(f"{p['skill']}: {len(p['actions'])} action(s)")
for a in p["actions"]:
print(f" - {a}")
print()
if not apply:
print("Re-run with --apply to write changes.")
else:
print("Migration complete. Run scripts/i18n/validate-i18n.py to verify.")
return 1 if any(p.get("errors") for p in plans) else 0
if __name__ == "__main__":
sys.exit(main(sys.argv[1:]))

View File

@@ -0,0 +1,156 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://desirecore.net/market/schema/skill-frontmatter.schema.json",
"title": "DesireCore Market Skill Frontmatter (i18n)",
"description": "JSON Schema for the YAML frontmatter of a SKILL.md in the DesireCore market. Compatible with the agentskills.io specification: https://agentskills.io/specification",
"type": "object",
"required": ["name", "description", "version", "metadata"],
"additionalProperties": true,
"properties": {
"name": {
"description": "Spec-required. Must equal the parent directory name. Lowercase ASCII letters, digits, hyphens. 1-64 chars. Must not start/end with hyphen, must not contain consecutive hyphens, must not be 'anthropic' or 'claude'.",
"type": "string",
"minLength": 1,
"maxLength": 64,
"pattern": "^(?!-)(?!.*--)(?!anthropic$)(?!claude$)[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$"
},
"description": {
"description": "Spec-required. 1-1024 chars. Single field used by Claude for skill discovery. Multilingual trigger keywords are accepted (Anthropic-recommended pattern for multilingual users).",
"type": "string",
"minLength": 1,
"maxLength": 1024
},
"license": {
"type": "string"
},
"compatibility": {
"type": "string",
"maxLength": 500
},
"version": {
"description": "SemVer string. Required by DesireCore market.",
"type": "string",
"pattern": "^\\d+\\.\\d+\\.\\d+(?:-[0-9A-Za-z.-]+)?(?:\\+[0-9A-Za-z.-]+)?$"
},
"type": {
"description": "DesireCore extension. Skill behavior class.",
"type": "string",
"enum": ["procedural", "meta", "knowledge", "tool"]
},
"risk_level": {
"description": "DesireCore extension.",
"type": "string",
"enum": ["low", "medium", "high", "critical"]
},
"status": {
"description": "DesireCore extension. Whether the skill is enabled by default.",
"type": "string",
"enum": ["enabled", "disabled", "deprecated", "experimental"]
},
"disable-model-invocation": {
"type": "boolean"
},
"tags": {
"type": "array",
"items": {
"type": "string",
"pattern": "^[a-z0-9][a-z0-9-]*$"
},
"uniqueItems": true
},
"allowed-tools": {
"type": "string"
},
"metadata": {
"type": "object",
"required": ["author", "i18n"],
"additionalProperties": true,
"properties": {
"author": { "type": "string" },
"updated_at": {
"type": "string",
"pattern": "^\\d{4}-\\d{2}-\\d{2}$"
},
"i18n": { "$ref": "#/$defs/i18nBlock" }
}
},
"market": {
"type": "object",
"additionalProperties": true,
"properties": {
"icon": { "type": "string" },
"category": { "type": "string" },
"channel": {
"type": "string",
"enum": ["latest", "stable", "canary", "beta"]
},
"maintainer": {
"type": "object",
"properties": {
"name": { "type": "string" },
"verified": { "type": "boolean" }
},
"required": ["name"]
},
"compatible_agents": {
"type": "array",
"items": { "type": "string" }
}
}
}
},
"$defs": {
"bcp47Locale": {
"description": "BCP-47 locale tag: ll-RR (lowercase language, uppercase region).",
"type": "string",
"pattern": "^[a-z]{2,3}(?:-[A-Z]{2})?$"
},
"i18nBlock": {
"type": "object",
"required": ["default_locale", "source_locale", "locales"],
"properties": {
"default_locale": { "$ref": "#/$defs/bcp47Locale" },
"source_locale": { "$ref": "#/$defs/bcp47Locale" },
"locales": {
"type": "array",
"minItems": 1,
"uniqueItems": true,
"items": { "$ref": "#/$defs/bcp47Locale" }
}
},
"additionalProperties": {
"$ref": "#/$defs/localePayload"
}
},
"localePayload": {
"description": "Per-locale display strings and body pointer. Keyed by BCP-47 locale tag.",
"type": "object",
"required": ["name", "short_desc"],
"properties": {
"name": { "type": "string", "minLength": 1, "maxLength": 200 },
"short_desc": { "type": "string", "minLength": 1, "maxLength": 300 },
"description": { "type": "string", "minLength": 1, "maxLength": 2000 },
"body": {
"description": "Relative path to the body Markdown for this locale, e.g. ./SKILL.zh-CN.md or ./SKILL.md.",
"type": "string",
"pattern": "^\\./[A-Za-z0-9._/-]+\\.md$"
},
"source_hash": {
"description": "sha256:<8-hex> of source body+i18n strings at translation time. Written by translate.py.",
"type": "string",
"pattern": "^sha256:[0-9a-f]{8,64}$"
},
"translated_by": {
"description": "'human' for human-authored content; 'ai:<backend>:<model-id>' or 'ai:<model-id>' for machine-translated content. Examples: 'ai:github:openai/gpt-5-mini', 'ai:anthropic:claude-sonnet-4-6', 'ai:claude-opus-4-7'.",
"type": "string",
"pattern": "^(human|ai:[A-Za-z0-9._:/-]+)$"
},
"translated_at": {
"type": "string",
"pattern": "^\\d{4}-\\d{2}-\\d{2}(?:T\\d{2}:\\d{2}:\\d{2}Z?)?$"
}
},
"additionalProperties": false
}
}
}

530
scripts/i18n/translate.py Executable file
View File

@@ -0,0 +1,530 @@
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.10"
# dependencies = ["ruamel.yaml>=0.18", "httpx>=0.27"]
# ///
"""AI translation pipeline for DesireCore market skills.
For each skill directory, ensure metadata.i18n contains every locale declared in
manifest.json/supportedLocales. When a target locale is missing or stale (its
source_hash differs from the current source body+strings hash), translate from
metadata.i18n.<source_locale>.body using an LLM.
Backends (auto-selected, in this priority):
1. GitHub Models (default) — uses GITHUB_TOKEN with `models: read` permission,
OpenAI-compatible chat-completions API at https://models.github.ai/inference.
Model defaults to `openai/gpt-5-mini` (configure with TRANSLATE_MODEL).
2. Anthropic API direct — used when ANTHROPIC_API_KEY is set AND
TRANSLATE_BACKEND=anthropic. Endpoint https://api.anthropic.com/v1/messages.
Model should be a Claude model id (e.g. claude-sonnet-4-6).
Translations preserve:
- Markdown structure (heading hierarchy, list ordering, tables, fences)
- Inline code, fenced code blocks, URLs, file paths
- SVG, HTML tags, YAML keys
- Glossary terms from scripts/i18n/glossary.json
- Reserved words from glossary.do_not_translate
Output:
- Updates metadata.i18n.<target_locale>.{name,short_desc,description,source_hash,
translated_by,translated_at}
- For target_locale == default_locale: writes the translated body to root SKILL.md
- Otherwise: writes SKILL.<target_locale>.md
Usage:
GITHUB_TOKEN=... scripts/i18n/translate.py # all stale locales
scripts/i18n/translate.py skills/web-access # one skill
scripts/i18n/translate.py --target en-US skills/web-access # one locale
scripts/i18n/translate.py --check # dry-run, exit 1 if stale
scripts/i18n/translate.py --human # mark new translations as human (lock)
Env:
GITHUB_TOKEN required when backend=github (CI: provided automatically)
ANTHROPIC_API_KEY required when TRANSLATE_BACKEND=anthropic
TRANSLATE_BACKEND 'github' (default) | 'anthropic'
TRANSLATE_MODEL backend-specific model id; default depends on backend
TRANSLATE_ENDPOINT override endpoint URL
TRANSLATE_MAX_RETRIES default 3
"""
from __future__ import annotations
import argparse
import hashlib
import json
import os
import re
import sys
import time
from datetime import datetime, timezone
from io import StringIO
from pathlib import Path
from typing import Any
import httpx
from ruamel.yaml import YAML
from ruamel.yaml.scalarstring import FoldedScalarString
REPO_ROOT = Path(__file__).resolve().parents[2]
GLOSSARY_PATH = REPO_ROOT / "scripts" / "i18n" / "glossary.json"
DEFAULT_BACKEND = os.environ.get("TRANSLATE_BACKEND", "github").lower()
DEFAULT_MODEL_BY_BACKEND = {
"github": os.environ.get("TRANSLATE_MODEL", "openai/gpt-5-mini"),
"anthropic": os.environ.get("TRANSLATE_MODEL", "claude-sonnet-4-6"),
}
DEFAULT_ENDPOINT_BY_BACKEND = {
"github": "https://models.github.ai/inference",
"anthropic": "https://api.anthropic.com",
}
MAX_RETRIES = int(os.environ.get("TRANSLATE_MAX_RETRIES", "3"))
HTTP_TIMEOUT = httpx.Timeout(connect=10, read=180, write=30, pool=10)
FRONTMATTER_RE = re.compile(r"^---\s*\n(.*?)\n---\s*\n(.*)$", re.DOTALL)
HEADING_RE = re.compile(r"^(#{1,6})\s+\S", re.MULTILINE)
LOCALE_HEADER_RE = re.compile(r"^<!--\s*locale:\s*[a-zA-Z-]+\s*-->\s*\n+", re.MULTILINE)
def make_yaml() -> YAML:
y = YAML()
y.indent(mapping=2, sequence=4, offset=2)
y.width = 4096
y.preserve_quotes = True
return y
def load_skill(skill_md: Path) -> tuple[Any, str]:
text = skill_md.read_text(encoding="utf-8")
m = FRONTMATTER_RE.match(text)
if not m:
raise ValueError(f"{skill_md}: no frontmatter")
fm = make_yaml().load(m.group(1))
return fm, m.group(2)
def dump_skill(fm: Any, body: str) -> str:
yaml = make_yaml()
buf = StringIO()
yaml.dump(fm, buf)
return f"---\n{buf.getvalue()}---\n\n{body.lstrip()}"
def strip_locale_header(text: str) -> str:
return LOCALE_HEADER_RE.sub("", text, count=1)
def compute_source_hash(body: str, strings: dict[str, str]) -> str:
h = hashlib.sha256()
h.update(body.encode("utf-8"))
h.update(b"\x00")
h.update(json.dumps(strings, sort_keys=True, ensure_ascii=False).encode("utf-8"))
return f"sha256:{h.hexdigest()[:16]}"
def heading_count(text: str) -> int:
return len(HEADING_RE.findall(text))
def load_glossary() -> dict[str, Any]:
if not GLOSSARY_PATH.is_file():
return {"terms": {}, "do_not_translate": []}
return json.loads(GLOSSARY_PATH.read_text(encoding="utf-8"))
# ----------------------------- prompt construction -----------------------------
def build_system_prompt(source_locale: str, target_locale: str, glossary: dict[str, Any]) -> str:
terms_key = f"{source_locale}_to_{target_locale}"
terms = glossary.get("terms", {}).get(terms_key, {})
do_not_translate = glossary.get("do_not_translate", [])
rules = (
f"You are a precise technical translator for DesireCore market skill documentation.\n"
f"Translate from {source_locale} to {target_locale}.\n\n"
"STRICT RULES:\n"
"1. Preserve Markdown structure exactly: heading levels, list nesting, tables, blockquotes, "
"fenced code blocks (```...```), inline code (`...`), HTML tags, SVG, YAML keys.\n"
"2. NEVER translate: code inside fences, inline `code`, URLs, file paths, command-line args, "
"env vars (e.g., $FOO, ${BAR}), Python/JS identifiers, YAML/JSON keys, version numbers.\n"
"3. Preserve exact heading text styling: '# H1', '## H2', etc.\n"
"4. Preserve list markers: '- ', '* ', '1. '. Preserve checkbox '[ ]' and '[x]'.\n"
"5. Preserve emoji, ASCII art (e.g. boxed diagrams), tree-view characters (├ └ │ ─).\n"
"6. Translate body prose, table cells (text only, not code), and short heading words.\n"
"7. Keep the output length within ~110% of the input length when possible.\n"
"8. Do NOT add explanatory comments, translator notes, or 'Translated from...' headers.\n"
"9. The first line may be an HTML comment '<!-- locale: ... -->'. Update its locale code "
"to the target locale; otherwise leave the comment unchanged.\n"
)
glossary_lines = ["GLOSSARY (use these mappings exactly):"]
for src, tgt in terms.items():
glossary_lines.append(f" {src}{tgt}")
if do_not_translate:
glossary_lines.append("\nDO NOT TRANSLATE these brand/technical terms (keep verbatim):")
glossary_lines.append(" " + ", ".join(do_not_translate))
output_format = (
"\n\nRESPONSE FORMAT:\n"
"Return ONLY a single JSON object with these keys (no preamble, no code fence around the JSON):\n"
" - body: translated Markdown body (string, may contain backticks/fences)\n"
" - name: translated short name (string, ≤100 chars)\n"
" - short_desc: translated short description (string, ≤200 chars)\n"
" - description: translated long description (string, ≤2000 chars)\n"
)
return rules + "\n" + "\n".join(glossary_lines) + output_format
# ----------------------------- backends -----------------------------
def call_github_models(system_prompt: str, user_payload: str, model: str, endpoint: str) -> str:
"""Call GitHub Models inference API (OpenAI-compatible chat completions).
Endpoint base: https://models.github.ai/inference
Auth: Authorization: Bearer <GITHUB_TOKEN> (token must have `models: read` scope).
"""
token = os.environ.get("GITHUB_TOKEN") or os.environ.get("GH_TOKEN")
if not token:
raise RuntimeError(
"GITHUB_TOKEN (or GH_TOKEN) not set. In CI, ensure your job has `permissions: models: read`. "
"Locally, create a fine-grained PAT with 'Models: Read' permission."
)
url = f"{endpoint.rstrip('/')}/chat/completions"
payload = {
"model": model,
"messages": [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_payload},
],
"temperature": 0.1,
"max_tokens": 8192,
}
headers = {
"Authorization": f"Bearer {token}",
"Content-Type": "application/json",
"Accept": "application/vnd.github+json",
"X-GitHub-Api-Version": "2022-11-28",
}
return _post_with_retries(url, headers, payload, extract=_extract_openai_text)
def call_anthropic(system_prompt: str, user_payload: str, model: str, endpoint: str) -> str:
"""Call Anthropic Messages API directly."""
api_key = os.environ.get("ANTHROPIC_API_KEY")
if not api_key:
raise RuntimeError("ANTHROPIC_API_KEY not set")
url = f"{endpoint.rstrip('/')}/v1/messages"
payload = {
"model": model,
"max_tokens": 8192,
"system": [{"type": "text", "text": system_prompt, "cache_control": {"type": "ephemeral"}}],
"messages": [{"role": "user", "content": user_payload}],
"temperature": 0.1,
}
headers = {
"x-api-key": api_key,
"anthropic-version": "2023-06-01",
"content-type": "application/json",
}
return _post_with_retries(url, headers, payload, extract=_extract_anthropic_text)
def _extract_openai_text(resp_json: dict) -> str:
try:
return resp_json["choices"][0]["message"]["content"]
except (KeyError, IndexError, TypeError) as e:
raise RuntimeError(f"Unexpected OpenAI-compatible response shape: {resp_json}") from e
def _extract_anthropic_text(resp_json: dict) -> str:
try:
parts = resp_json["content"]
return "".join(p.get("text", "") for p in parts if p.get("type") == "text")
except (KeyError, TypeError) as e:
raise RuntimeError(f"Unexpected Anthropic response shape: {resp_json}") from e
def _post_with_retries(url: str, headers: dict, payload: dict, *, extract) -> str:
last_err: Exception | None = None
for attempt in range(1, MAX_RETRIES + 1):
try:
with httpx.Client(timeout=HTTP_TIMEOUT) as client:
resp = client.post(url, headers=headers, json=payload)
if resp.status_code == 429 or resp.status_code >= 500:
raise httpx.HTTPStatusError(f"{resp.status_code}", request=resp.request, response=resp)
resp.raise_for_status()
return extract(resp.json())
except (httpx.HTTPStatusError, httpx.RequestError, json.JSONDecodeError) as e:
last_err = e
if attempt < MAX_RETRIES:
wait = 2 ** attempt
sys.stderr.write(f"[translate] retry {attempt}/{MAX_RETRIES} after {wait}s ({e})\n")
time.sleep(wait)
raise RuntimeError(f"Translation failed after {MAX_RETRIES} attempts: {last_err}")
def call_llm(system_prompt: str, user_payload: str, *, backend: str, model: str, endpoint: str) -> dict[str, str]:
if backend == "github":
text = call_github_models(system_prompt, user_payload, model, endpoint)
elif backend == "anthropic":
text = call_anthropic(system_prompt, user_payload, model, endpoint)
else:
raise RuntimeError(f"Unknown backend: {backend}")
return parse_json_response(text)
def parse_json_response(text: str) -> dict[str, str]:
text = text.strip()
if text.startswith("```"):
text = re.sub(r"^```(?:json)?\s*\n", "", text)
text = re.sub(r"\n```\s*$", "", text)
try:
obj = json.loads(text)
except json.JSONDecodeError as e:
m = re.search(r"\{.*\}", text, re.DOTALL)
if m:
obj = json.loads(m.group(0))
else:
raise RuntimeError(f"Failed to parse model response as JSON: {e}\n--- Raw response ---\n{text[:500]}")
for k in ("body", "name", "short_desc"):
if k not in obj or not isinstance(obj[k], str):
raise RuntimeError(f"Translation response missing required key '{k}'")
obj.setdefault("description", "")
return obj
# ----------------------------- per-skill translation -----------------------------
def translate_skill(
skill_dir: Path,
target_locale: str,
*,
check_only: bool,
mark_human: bool,
backend: str,
model: str,
endpoint: str,
) -> dict[str, Any]:
rel = skill_dir.relative_to(REPO_ROOT).as_posix()
skill_md = skill_dir / "SKILL.md"
plan: dict[str, Any] = {"skill": rel, "target": target_locale, "actions": [], "errors": []}
fm, root_body = load_skill(skill_md)
metadata = fm.get("metadata") or {}
i18n = metadata.get("i18n") if isinstance(metadata, dict) else None
if not isinstance(i18n, dict):
plan["errors"].append("metadata.i18n missing — run migrate.py first")
return plan
source_locale = i18n.get("source_locale")
default_locale = i18n.get("default_locale")
if not source_locale or not default_locale:
plan["errors"].append("i18n missing source_locale or default_locale")
return plan
if target_locale == source_locale:
plan["actions"].append("target == source, skipping")
return plan
src_block = i18n.get(source_locale) or {}
src_body_path_str = src_block.get("body")
if not src_body_path_str:
plan["errors"].append(f"i18n.{source_locale}.body not set")
return plan
src_body_file = (skill_dir / src_body_path_str.removeprefix("./")).resolve()
if not src_body_file.is_file():
plan["errors"].append(f"source body file not found: {src_body_path_str}")
return plan
src_body_text = strip_locale_header(src_body_file.read_text(encoding="utf-8"))
src_strings = {
"name": str(src_block.get("name", "")),
"short_desc": str(src_block.get("short_desc", "")),
}
if src_block.get("description"):
src_strings["description"] = str(src_block["description"])
current_hash = compute_source_hash(src_body_text, src_strings)
target_block = i18n.get(target_locale) or {}
if target_block.get("translated_by") == "human":
if target_block.get("source_hash") != current_hash:
plan["actions"].append(
f"WARN: human-translated locale {target_locale} is stale "
f"(source_hash drift). Skipping; please update manually."
)
else:
plan["actions"].append(f"locale {target_locale} is human-locked, skipping")
return plan
needs = (not target_block) or (target_block.get("source_hash") != current_hash)
if not needs:
plan["actions"].append(f"locale {target_locale} is up-to-date (hash match), skipping")
return plan
if check_only:
plan["actions"].append(f"locale {target_locale} needs translation (hash mismatch or missing)")
plan["needs_translation"] = True
return plan
payload = {
"source_locale": source_locale,
"target_locale": target_locale,
"skill_id": skill_dir.name,
"source": {
"name": src_strings["name"],
"short_desc": src_strings["short_desc"],
"description": src_strings.get("description", ""),
"body": src_body_text,
},
}
user_payload = (
"Translate the following skill content. Return ONLY the JSON object as specified.\n\n"
f"```json\n{json.dumps(payload, ensure_ascii=False)}\n```"
)
glossary = load_glossary()
system_prompt = build_system_prompt(source_locale, target_locale, glossary)
plan["actions"].append(f"calling {backend}/{model} for {target_locale} translation ...")
translated = call_llm(system_prompt, user_payload, backend=backend, model=model, endpoint=endpoint)
src_h = heading_count(src_body_text)
tgt_h = heading_count(translated["body"])
if abs(tgt_h - src_h) > 0:
plan["errors"].append(f"heading count mismatch (source={src_h}, target={tgt_h}); rejecting")
return plan
if target_locale not in i18n.get("locales", []):
i18n["locales"].append(target_locale)
new_block: dict[str, Any] = {
"name": translated["name"],
"short_desc": translated["short_desc"],
}
if translated.get("description"):
desc = translated["description"]
new_block["description"] = FoldedScalarString(desc) if "\n" in desc or len(desc) > 80 else desc
if target_locale == default_locale:
new_block["body"] = "./SKILL.md"
else:
new_block["body"] = f"./SKILL.{target_locale}.md"
new_block["source_hash"] = current_hash
translator_tag = "human" if mark_human else f"ai:{backend}:{model}"
new_block["translated_by"] = translator_tag
new_block["translated_at"] = datetime.now(tz=timezone.utc).strftime("%Y-%m-%d")
i18n[target_locale] = new_block
body_to_write = translated["body"]
if target_locale == default_locale:
body_to_write = LOCALE_HEADER_RE.sub("", body_to_write, count=1)
skill_md.write_text(dump_skill(fm, body_to_write), encoding="utf-8")
plan["actions"].append(f"wrote root SKILL.md with translated body ({len(body_to_write)} chars)")
else:
target_body_file = skill_dir / f"SKILL.{target_locale}.md"
if not body_to_write.startswith("<!-- locale:"):
body_to_write = f"<!-- locale: {target_locale} -->\n\n{body_to_write.lstrip()}"
target_body_file.write_text(body_to_write, encoding="utf-8")
skill_md.write_text(dump_skill(fm, root_body), encoding="utf-8")
plan["actions"].append(f"wrote {target_body_file.name} ({len(body_to_write)} chars) and updated root frontmatter")
return plan
def get_target_locales(args: argparse.Namespace) -> list[str]:
if args.target:
return [args.target]
manifest_path = REPO_ROOT / "manifest.json"
if not manifest_path.is_file():
return ["en-US"]
try:
manifest = json.loads(manifest_path.read_text(encoding="utf-8"))
except json.JSONDecodeError:
return ["en-US"]
return list(manifest.get("supportedLocales") or ["en-US"])
def resolve_backend(args: argparse.Namespace) -> tuple[str, str, str]:
backend = (args.backend or DEFAULT_BACKEND).lower()
if backend not in ("github", "anthropic"):
raise SystemExit(f"Unknown backend '{backend}'; choose 'github' or 'anthropic'")
model = args.model or DEFAULT_MODEL_BY_BACKEND[backend]
endpoint = args.endpoint or os.environ.get("TRANSLATE_ENDPOINT") or DEFAULT_ENDPOINT_BY_BACKEND[backend]
return backend, model, endpoint
def list_github_models() -> int:
token = os.environ.get("GITHUB_TOKEN") or os.environ.get("GH_TOKEN")
if not token:
sys.stderr.write("ERROR: GITHUB_TOKEN/GH_TOKEN not set\n")
return 2
url = "https://models.github.ai/catalog/models"
with httpx.Client(timeout=HTTP_TIMEOUT) as c:
resp = c.get(url, headers={"Authorization": f"Bearer {token}"})
resp.raise_for_status()
for m in resp.json():
print(f" {m.get('id',''):50s} {m.get('publisher','')}")
return 0
def main(argv: list[str]) -> int:
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument("paths", nargs="*", help="Skill directories (default: all under skills/)")
parser.add_argument("--target", help="Single target locale (default: all manifest.supportedLocales)")
parser.add_argument("--check", action="store_true", help="Report stale translations; exit 1 if any")
parser.add_argument("--human", action="store_true", help="Mark new translations as 'human' (locks against re-translation)")
parser.add_argument("--backend", choices=("github", "anthropic"), help="Override backend (default: env TRANSLATE_BACKEND or 'github')")
parser.add_argument("--model", help="Override model id")
parser.add_argument("--endpoint", help="Override API endpoint")
parser.add_argument("--list-models", action="store_true", help="List models in GitHub Models catalog and exit")
args = parser.parse_args(argv)
if args.list_models:
return list_github_models()
backend, model, endpoint = resolve_backend(args)
if not args.check:
if backend == "github" and not (os.environ.get("GITHUB_TOKEN") or os.environ.get("GH_TOKEN")):
sys.stderr.write("ERROR: GITHUB_TOKEN (or GH_TOKEN) not set for backend='github'\n")
return 2
if backend == "anthropic" and not os.environ.get("ANTHROPIC_API_KEY"):
sys.stderr.write("ERROR: ANTHROPIC_API_KEY not set for backend='anthropic'\n")
return 2
if args.paths:
targets = [Path(p).resolve() for p in args.paths]
else:
targets = sorted((REPO_ROOT / "skills").iterdir())
targets = [t for t in targets if t.is_dir() and (t / "SKILL.md").is_file()]
target_locales = get_target_locales(args)
plans: list[dict[str, Any]] = []
for skill_dir in targets:
if not (skill_dir.is_dir() and (skill_dir / "SKILL.md").is_file()):
continue
for tl in target_locales:
plans.append(translate_skill(
skill_dir, tl,
check_only=args.check, mark_human=args.human,
backend=backend, model=model, endpoint=endpoint,
))
needs = [p for p in plans if p.get("needs_translation")]
errs = [p for p in plans if p.get("errors")]
if args.check:
for p in plans:
for a in p["actions"]:
print(f" [{p['skill']}/{p['target']}] {a}")
for p in errs:
for e in p["errors"]:
print(f" ERROR [{p['skill']}/{p['target']}]: {e}")
return 1 if needs else 0
print(f"Backend: {backend} Model: {model} Endpoint: {endpoint}\n")
for p in plans:
print(f"{p['skill']}{p['target']}:")
for a in p["actions"]:
print(f" - {a}")
for e in p.get("errors", []):
print(f" ✗ ERROR: {e}")
return 1 if errs else 0
if __name__ == "__main__":
sys.exit(main(sys.argv[1:]))

340
scripts/i18n/validate-i18n.py Executable file
View File

@@ -0,0 +1,340 @@
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.10"
# dependencies = ["pyyaml>=6.0"]
# ///
"""Validate DesireCore market i18n state.
Checks:
1. SKILL.md frontmatter top-level `name` matches parent dir name and is spec-compliant.
2. metadata.i18n.default_locale and source_locale are listed in metadata.i18n.locales.
3. Each declared locale has metadata.i18n.<locale>.{name,short_desc}.
4. metadata.i18n.<locale>.body, if present, points to an existing file; otherwise the
fallback chain must terminate at a readable root SKILL.md.
5. SKILL.<locale>.md, if it declares <!-- locale: ... -->, must match the filename locale.
6. Frontmatter parses cleanly; heading count of locale body matches source body (+/- 0).
7. categories.json's per-category i18n covers all locales declared in manifest.json.
8. Top-level description is 1-1024 chars (spec); top-level name is 1-64 chars (spec).
Exit codes:
0 = pass
1 = validation errors found
2 = unexpected runtime error / missing dependencies
Usage:
python3 scripts/i18n/validate-i18n.py # validate everything under repo root
python3 scripts/i18n/validate-i18n.py skills/web-access # validate single skill
python3 scripts/i18n/validate-i18n.py --json # machine-readable output
"""
from __future__ import annotations
import argparse
import json
import re
import sys
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any, Iterable
try:
import yaml
except ImportError:
sys.stderr.write("ERROR: PyYAML is required. Install with: pip install pyyaml\n")
sys.exit(2)
REPO_ROOT = Path(__file__).resolve().parents[2]
NAME_PATTERN = re.compile(r"^(?!-)(?!.*--)[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$")
RESERVED_NAMES = {"anthropic", "claude"}
LOCALE_PATTERN = re.compile(r"^[a-z]{2,3}(?:-[A-Z]{2})?$")
LOCALE_HEADER_PATTERN = re.compile(r"^<!--\s*locale:\s*([a-zA-Z-]+)\s*-->")
HEADING_PATTERN = re.compile(r"^(#{1,6})\s+\S", re.MULTILINE)
FRONTMATTER_RE = re.compile(r"^---\s*\n(.*?)\n---\s*\n(.*)$", re.DOTALL)
@dataclass
class Issue:
path: str
rule: str
message: str
severity: str = "error"
def to_dict(self) -> dict[str, str]:
return {
"path": self.path,
"rule": self.rule,
"message": self.message,
"severity": self.severity,
}
@dataclass
class Report:
issues: list[Issue] = field(default_factory=list)
def add(self, issue: Issue) -> None:
self.issues.append(issue)
@property
def has_errors(self) -> bool:
return any(i.severity == "error" for i in self.issues)
def parse_frontmatter(text: str) -> tuple[dict[str, Any] | None, str | None, str | None]:
"""Return (frontmatter_dict, body, error). All three Nones means file is empty."""
if not text.strip():
return None, None, "empty file"
m = FRONTMATTER_RE.match(text)
if not m:
return None, None, "no YAML frontmatter (file must start with '---')"
try:
fm = yaml.safe_load(m.group(1)) or {}
except yaml.YAMLError as e:
return None, None, f"YAML parse error: {e}"
if not isinstance(fm, dict):
return None, None, "frontmatter must be a YAML mapping"
return fm, m.group(2), None
def heading_count(text: str) -> int:
return len(HEADING_PATTERN.findall(text or ""))
def validate_skill(skill_dir: Path, report: Report, declared_locales: set[str] | None = None) -> None:
"""Validate one skill directory (must contain SKILL.md)."""
rel_dir = skill_dir.relative_to(REPO_ROOT).as_posix()
skill_md = skill_dir / "SKILL.md"
if not skill_md.is_file():
report.add(Issue(rel_dir, "structure", "SKILL.md not found"))
return
text = skill_md.read_text(encoding="utf-8")
fm, body, err = parse_frontmatter(text)
if err:
report.add(Issue(f"{rel_dir}/SKILL.md", "rule-6", err))
return
assert fm is not None and body is not None
name = fm.get("name", "")
description = fm.get("description", "")
# Rule 1: name spec-compliance + matches dir
if not isinstance(name, str) or not NAME_PATTERN.match(name) or len(name) > 64 or name in RESERVED_NAMES:
report.add(Issue(
f"{rel_dir}/SKILL.md", "rule-1",
f"name {name!r} is not spec-compliant (must be lowercase ASCII + hyphens, 1-64 chars, not 'anthropic'/'claude')"
))
if name != skill_dir.name:
report.add(Issue(
f"{rel_dir}/SKILL.md", "rule-1",
f"name '{name}' must equal parent dir name '{skill_dir.name}' (spec)"
))
# Rule 8: description length
if not isinstance(description, str) or not (1 <= len(description) <= 1024):
report.add(Issue(
f"{rel_dir}/SKILL.md", "rule-8",
f"description must be 1-1024 chars (got {len(description) if isinstance(description, str) else 'non-string'})"
))
# Rule 2/3/4: i18n block
metadata = fm.get("metadata") or {}
i18n = metadata.get("i18n") if isinstance(metadata, dict) else None
if not isinstance(i18n, dict):
report.add(Issue(f"{rel_dir}/SKILL.md", "rule-2", "metadata.i18n block missing"))
return
default_locale = i18n.get("default_locale")
source_locale = i18n.get("source_locale")
locales = i18n.get("locales") or []
if not isinstance(locales, list) or not all(isinstance(x, str) for x in locales):
report.add(Issue(f"{rel_dir}/SKILL.md", "rule-2", "metadata.i18n.locales must be a list of strings"))
return
locale_set = set(locales)
for tag in ("default_locale", "source_locale"):
val = i18n.get(tag)
if not isinstance(val, str) or not LOCALE_PATTERN.match(val):
report.add(Issue(f"{rel_dir}/SKILL.md", "rule-2", f"metadata.i18n.{tag} '{val!r}' is not a valid BCP-47 locale"))
elif val not in locale_set:
report.add(Issue(f"{rel_dir}/SKILL.md", "rule-2", f"metadata.i18n.{tag} '{val}' not present in metadata.i18n.locales"))
if declared_locales is not None:
missing = declared_locales - locale_set
if missing:
report.add(Issue(
f"{rel_dir}/SKILL.md", "rule-7",
f"manifest declares locales {sorted(declared_locales)} but skill is missing {sorted(missing)}",
severity="error"
))
# Rule 3: per-locale name/short_desc presence
source_body_text: str | None = None
for locale in locales:
if not LOCALE_PATTERN.match(locale):
report.add(Issue(f"{rel_dir}/SKILL.md", "rule-3", f"locale '{locale}' is not a valid BCP-47 tag"))
continue
payload = i18n.get(locale)
if not isinstance(payload, dict):
report.add(Issue(f"{rel_dir}/SKILL.md", "rule-3", f"metadata.i18n.{locale} block missing or not a mapping"))
continue
for required in ("name", "short_desc"):
v = payload.get(required)
if not isinstance(v, str) or not v.strip():
report.add(Issue(
f"{rel_dir}/SKILL.md", "rule-3",
f"metadata.i18n.{locale}.{required} is missing or empty"
))
# Rule 4: body file presence
body_path_str = payload.get("body")
body_text: str | None = None
if body_path_str:
if not isinstance(body_path_str, str) or not body_path_str.startswith("./"):
report.add(Issue(
f"{rel_dir}/SKILL.md", "rule-4",
f"metadata.i18n.{locale}.body must be a relative path starting with './' (got {body_path_str!r})"
))
else:
body_file = (skill_dir / body_path_str.removeprefix("./")).resolve()
if not body_file.is_file():
report.add(Issue(
f"{rel_dir}/SKILL.md", "rule-4",
f"metadata.i18n.{locale}.body points to missing file '{body_path_str}'"
))
else:
body_text = body_file.read_text(encoding="utf-8")
# Rule 5: locale header self-check (only when not the root SKILL.md)
if body_file.name != "SKILL.md":
first_line = body_text.splitlines()[0] if body_text else ""
m = LOCALE_HEADER_PATTERN.match(first_line)
if m and m.group(1) != locale:
report.add(Issue(
f"{rel_dir}/{body_file.name}", "rule-5",
f"file declares locale '{m.group(1)}' but is referenced as '{locale}'"
))
else:
# Fallback to root SKILL.md body (default_locale must have a usable body)
if locale == default_locale:
body_text = body
else:
# OK to omit body for non-default locales (will fall back at runtime)
pass
if locale == source_locale:
source_body_text = body_text or body # source defaults to root if not specified
# Rule 6: heading count consistency between source and other locales' bodies
if source_body_text is not None:
source_count = heading_count(source_body_text)
for locale in locales:
if locale == source_locale:
continue
payload = i18n.get(locale) or {}
body_path_str = payload.get("body")
if body_path_str:
body_file = (skill_dir / body_path_str.removeprefix("./")).resolve()
if body_file.is_file():
other_text = body_file.read_text(encoding="utf-8")
other_count = heading_count(other_text)
if other_count != source_count:
report.add(Issue(
f"{rel_dir}/{body_file.name}", "rule-6",
f"heading count {other_count} differs from source ({source_count})",
severity="warning",
))
def validate_market_root(report: Report) -> set[str]:
"""Validate manifest.json + categories.json. Returns the declared locale set or empty."""
manifest_path = REPO_ROOT / "manifest.json"
categories_path = REPO_ROOT / "categories.json"
declared: set[str] = set()
if manifest_path.is_file():
try:
manifest = json.loads(manifest_path.read_text(encoding="utf-8"))
except json.JSONDecodeError as e:
report.add(Issue("manifest.json", "rule-7", f"JSON parse error: {e}"))
manifest = {}
supported = manifest.get("supportedLocales") or []
if not isinstance(supported, list) or not all(isinstance(x, str) and LOCALE_PATTERN.match(x) for x in supported):
report.add(Issue("manifest.json", "rule-7", "supportedLocales must be a list of BCP-47 tags"))
else:
declared = set(supported)
default = manifest.get("defaultLocale")
if declared and default not in declared:
report.add(Issue("manifest.json", "rule-7", f"defaultLocale '{default}' not in supportedLocales"))
if categories_path.is_file() and declared:
try:
categories = json.loads(categories_path.read_text(encoding="utf-8"))
except json.JSONDecodeError as e:
report.add(Issue("categories.json", "rule-7", f"JSON parse error: {e}"))
return declared
for cat_id, cat in categories.items():
i18n = cat.get("i18n") if isinstance(cat, dict) else None
if not isinstance(i18n, dict):
report.add(Issue("categories.json", "rule-7", f"category '{cat_id}' missing i18n block"))
continue
for locale in declared:
payload = i18n.get(locale)
if not isinstance(payload, dict) or not payload.get("label"):
report.add(Issue(
"categories.json", "rule-7",
f"category '{cat_id}' missing i18n.{locale}.label"
))
return declared
def iter_skill_dirs(targets: Iterable[Path]) -> Iterable[Path]:
for t in targets:
if t.is_file() and t.name == "SKILL.md":
yield t.parent
elif t.is_dir() and (t / "SKILL.md").is_file():
yield t
elif t.is_dir():
for child in sorted(t.iterdir()):
if child.is_dir() and (child / "SKILL.md").is_file():
yield child
def main(argv: list[str]) -> int:
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument("paths", nargs="*", help="Skills or directories to validate (default: repo root)")
parser.add_argument("--json", action="store_true", help="Emit machine-readable JSON")
args = parser.parse_args(argv)
report = Report()
declared_locales = validate_market_root(report)
if args.paths:
targets = [Path(p).resolve() for p in args.paths]
else:
targets = [REPO_ROOT / "skills"]
for skill_dir in iter_skill_dirs(targets):
validate_skill(skill_dir, report, declared_locales=declared_locales or None)
if args.json:
json.dump([i.to_dict() for i in report.issues], sys.stdout, indent=2, ensure_ascii=False)
sys.stdout.write("\n")
else:
if not report.issues:
print("OK: no i18n issues found.")
else:
for issue in report.issues:
marker = "ERROR" if issue.severity == "error" else "WARN "
print(f"[{marker}] {issue.path} :: {issue.rule} :: {issue.message}")
errors = sum(1 for i in report.issues if i.severity == "error")
warns = sum(1 for i in report.issues if i.severity == "warning")
print(f"\n{errors} error(s), {warns} warning(s).")
return 1 if report.has_errors else 0
if __name__ == "__main__":
try:
sys.exit(main(sys.argv[1:]))
except KeyboardInterrupt:
sys.exit(130)