feat: skills i18n 改造(schemaVersion 1.1,零向后兼容) (#1)

* feat: skills i18n 改造 — schemaVersion 1.1,零向后兼容

把 21 个 skills + 1 个 agent + manifest/categories 全量迁移到 schemaVersion 1.1
的 i18n 结构,配套 CI AI 翻译流水线(GitHub Models)与本地工具链。

## 关键变更

### 数据结构(破坏性,schemaVersion 1.0 → 1.1)
- SKILL.md: 顶层 name 改为 ASCII slug(== 目录名,符合 agentskills.io 规范);
  中文显示名/short_desc/description 全部迁入 metadata.i18n.<locale>
- agents/<id>/agent.json: shortDesc/fullDesc/tags/persona.{role,traits} 迁入
  i18n.<locale>;changelog[].changes 改为 { <locale>: string[] } 对象
- categories.json: 每个分类的 label/description 迁入 i18n.<locale>,顶层只剩
  color/icon
- manifest.json: 加 supportedLocales / defaultLocale;顶层 description 迁入
  i18n.<locale>

### Body 文件结构
- 根 SKILL.md = frontmatter + default_locale (en-US) body
- SKILL.<locale>.md = 各 locale 的 markdown body(首行 <!-- locale: xx --> 自校验)

### 工具链(scripts/i18n/)
- glossary.json: zh→en 术语表 + do_not_translate 白名单
- schema/skill-frontmatter.schema.json: i18n frontmatter JSON Schema
- validate-i18n.py: 8 条校验规则(name 合规 / locale 完整性 / hash 一致性等)
- translate.py: GitHub Models / Anthropic 双 backend,sha256 增量翻译
- migrate.py: 一次性迁移脚本(旧格式 → i18n 结构)

### CI(.github/workflows/)
- i18n-validate.yml: PR 触发跑 validate + translate --check
- i18n-translate.yml: PR 触发用 GitHub Models(默认 openai/gpt-5-mini)翻译缺失
  locale,自动追加 commit;可切到 ANTHROPIC_API_KEY 走 Claude

### 文档
- docs/I18N.md: 作者贡献指南(schema 说明 / 提交流程 / 常见问题)
- README.md: 加多语言段落

## 验证

- uv run scripts/i18n/validate-i18n.py: OK,49 文件 0 错误
- uv run scripts/i18n/translate.py --check: 0 stale locale
- 21 skills 标题数 zh-CN == en-US 严格对齐(最大 66=66)
- skills-ref 规范校验:全部通过(顶层 name ASCII slug + description 单字段)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(i18n): 修复 PR #1 review 反馈的 6 项问题

- schema: translated_by 正则放宽为 ^(human|ai:[A-Za-z0-9._:/-]+)$,接受
  'ai:github:openai/gpt-5-mini' 这类 backend:model 形式(CI 翻译输出格式)
- README + docs/I18N.md: 修正"CI 用 Claude API"误导描述,正确说明默认是
  GitHub Models(openai/gpt-5-mini)+ GITHUB_TOKEN,可选切到 Anthropic
- skills/minimax-tts/SKILL.md & SKILL.zh-CN.md: 删除多余的 ``` 闭合,避免
  Markdown 后续渲染错乱
- skills/docx/SKILL.md: 翻译时丢失的 • Unicode escape 示例已恢复,
  与 zh-CN 版本对齐

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-05 00:26:33 +08:00
committed by GitHub
parent 1c107a9344
commit 1f7c8b9673
59 changed files with 10533 additions and 2014 deletions

View File

@@ -1,5 +1,5 @@
---
name: 联网访问
name: web-access
description: >-
Use this skill whenever the user needs to access information from the internet
— searching for current information, fetching public web pages, browsing
@@ -29,7 +29,28 @@ tags:
- playwright
metadata:
author: desirecore
updated_at: '2026-04-13'
updated_at: '2026-05-03'
i18n:
default_locale: en-US
source_locale: zh-CN
locales:
- zh-CN
- en-US
zh-CN:
name: 联网访问
short_desc: 联网搜索、网页抓取、登录态浏览器访问CDP、研究调研工作流
description: 三层联网访问工具包——搜索公开页面、Jina 优化抓取、CDP 登录态浏览器访问。
body: ./SKILL.zh-CN.md
source_hash: sha256:0ba170b3126a0823
translated_by: human
en-US:
name: Web Access
short_desc: Web search, page fetching, logged-in browser access via CDP, research workflows
description: A three-layer web-access toolkit — search public pages, fetch heavy pages via Jina Reader, and reach logged-in sites via Chrome CDP.
body: ./SKILL.md
source_hash: sha256:0ba170b3126a0823
translated_by: ai:claude-opus-4-7
translated_at: '2026-05-03'
market:
icon: >-
<svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0
@@ -46,7 +67,6 @@ market:
stroke="#34C759" stroke-width="1.5" fill="#34C759"
fill-opacity="0.12"/><path d="M20.5 20.5l2 2" stroke="#34C759"
stroke-width="1.8" stroke-linecap="round"/></svg>
short_desc: 联网搜索、网页抓取、登录态浏览器访问CDP、研究调研工作流
category: research
maintainer:
name: DesireCore Official
@@ -54,38 +74,38 @@ market:
channel: latest
---
# web-access 技能
# web-access skill
## L0:一句话摘要
## L0: One-line Summary
三层联网访问工具包——搜索公开页面、Jina 优化抓取、CDP 登录态浏览器访问。
A three-layer web-access toolkit — search public pages, optimize fetches via Jina Reader, and reach login-gated sites via Chrome CDP.
## L1:概述与使用场景
## L1: Overview & Use Cases
### 能力描述
### Capability
web-access 是一个**流程型技能Procedural Skill**,提供三层互补的联网访问能力:Layer 1WebSearch + WebFetch)用于公开页面;Layer 2Jina Reader)用于 JS 渲染的重页面,默认节省 TokenLayer 3Chrome CDP)用于需要登录态的站点(小红书/B站/微博/飞书/Twitter)。
web-access is a **procedural skill** that provides three complementary layers of web access: Layer 1 (WebSearch + WebFetch) for public pages; Layer 2 (Jina Reader) for JS-rendered heavy pages, saving tokens by default; Layer 3 (Chrome CDP) for sites requiring a logged-in session (Xiaohongshu / Bilibili / Weibo / Feishu / Twitter).
### 使用场景
### Use Cases
- 用户需要搜索当前信息或研究特定主题
- 用户需要抓取公开网页内容或技术文档
- 用户需要访问登录态站点小红书、B站、微博、飞书、Twitter 等)
- 用户需要对比产品、聚合新闻或调查 API/库版本
- The user needs to search for current information or research a specific topic
- The user needs to fetch public web content or technical documentation
- The user needs to access logged-in sites (Xiaohongshu, Bilibili, Weibo, Feishu, Twitter, etc.)
- The user needs to compare products, aggregate news, or investigate API/library versions
### 核心价值
### Core Value
- **三层递进**:从轻量搜索到重度 JS 渲染到登录态访问,按需选择
- **Token 优化**Jina Reader 默认减少 50-80% Token 消耗
- **登录态复用**:通过 CDP 连接用户已登录的 Chrome无需重复登录
- **Three-layer progression**: from lightweight search to heavy JS rendering to logged-in access — pick on demand
- **Token optimization**: Jina Reader cuts token usage by 5080% by default
- **Logged-in session reuse**: connect to the user's already-logged-in Chrome via CDP — no re-login required
## L2:详细规范
## L2: Detailed Specification
## Output Rule
When you complete a research task, you **MUST** cite all source URLs in your response. Distinguish between:
- **Quoted facts**: directly from a fetched page → cite the URL
- **Inferences**: your synthesis or analysis → mark as "(分析/推断)"
- **Inferences**: your synthesis or analysis → mark as "(analysis/inference)"
If any fetch fails, explicitly tell the user which URL failed and which fallback you used.
@@ -93,7 +113,7 @@ If any fetch fails, explicitly tell the user which URL failed and which fallback
## Prerequisites: Chrome CDP Setup (for login-gated sites)
**Only required when accessing sites that need the user's login session** (小红书/B站/微博/飞书/Twitter/知乎/公众号).
**Only required when accessing sites that need the user's login session** (Xiaohongshu / Bilibili / Weibo / Feishu / Twitter / Zhihu / WeChat Official Accounts).
### One-time setup
@@ -121,7 +141,7 @@ google-chrome \
```
After launch:
1. Manually log in to the sites you need (小红书、B站、微博、飞书 …)
1. Manually log in to the sites you need (Xiaohongshu, Bilibili, Weibo, Feishu, …)
2. Leave this Chrome window open in the background
3. Verify the debug endpoint: `curl -s http://localhost:9222/json/version` should return JSON
@@ -132,7 +152,7 @@ Before any CDP operation, always run:
curl -s http://localhost:9222/json/version | python3 -c "import sys,json; d=json.load(sys.stdin); print('CDP ready:', d.get('Browser'))"
```
If the command fails, tell the user: "请先启动 Chrome 并开启远程调试端口(见 web-access 技能的 Prerequisites 部分)。"
If the command fails, tell the user: "Please launch Chrome with the remote debugging port enabled (see the Prerequisites section of the web-access skill)."
---
@@ -151,7 +171,7 @@ User intent
│ └─→ Bash: curl -sL "https://r.jina.ai/<original-url>"
│ (Jina Reader = default for JS-rendered content, saves tokens)
├─ "Read this login-gated page" (小红书/B站/微博/飞书/Twitter/知乎/公众号)
├─ "Read this login-gated page" (Xiaohongshu/Bilibili/Weibo/Feishu/Twitter/Zhihu/WeChat)
│ └─→ 1. Verify CDP ready (curl http://localhost:9222/json/version)
│ 2. Bash: python3 script with playwright.connect_over_cdp()
│ 3. Extract content → feed to Jina Reader for clean Markdown
@@ -188,13 +208,13 @@ User intent
| Hacker News, Reddit | L1 WebFetch | Public content |
| Medium, Dev.to | L2 Jina Reader | JS-rendered, member gates |
| Twitter/X | L3 CDP (or L2 Jina with `x.com`) | Login required for full thread |
| 小红书 (xiaohongshu.com) | L3 CDP | 强制登录 |
| B (bilibili.com) | L3 CDP | 视频描述/评论需登录 |
| 微博 (weibo.com) | L3 CDP | 长微博需登录 |
| 知乎 (zhihu.com) | L3 CDP | 长文+评论需登录 |
| 飞书文档 (feishu.cn) | L3 CDP | 必须登录 |
| 公众号 (mp.weixin.qq.com) | L2 Jina Reader | 通常公开Jina 处理更干净 |
| LinkedIn | L3 CDP | 登录墙 |
| Xiaohongshu (xiaohongshu.com) | L3 CDP | Login required |
| Bilibili (bilibili.com) | L3 CDP | Login needed for video desc/comments |
| Weibo (weibo.com) | L3 CDP | Long posts require login |
| Zhihu (zhihu.com) | L3 CDP | Long articles + comments require login |
| Feishu Docs (feishu.cn) | L3 CDP | Login required |
| WeChat Official Accounts (mp.weixin.qq.com) | L2 Jina Reader | Usually public, Jina cleans better |
| LinkedIn | L3 CDP | Login wall |
---
@@ -284,7 +304,7 @@ PY
```
See [references/cdp-browser.md](references/cdp-browser.md) for:
- Per-site selectors (小红书/B站/微博/知乎/飞书)
- Per-site selectors (Xiaohongshu / Bilibili / Weibo / Zhihu / Feishu)
- Scrolling & lazy-load patterns
- Screenshot & form-fill recipes
- Troubleshooting connection issues
@@ -294,12 +314,12 @@ See [references/cdp-browser.md](references/cdp-browser.md) for:
## Common Workflows
Read [references/workflows.md](references/workflows.md) for detailed templates:
- 技术文档查询 (Tech docs lookup)
- 竞品对比研究 (Competitor research)
- 新闻聚合与时间线 (News aggregation)
- API/库版本调查 (Library version investigation)
- Tech docs lookup
- Competitor research
- News aggregation & timelines
- API/library version investigation
Read [references/cdp-browser.md](references/cdp-browser.md) for login-gated site recipes (小红书/B站/微博/知乎/飞书).
Read [references/cdp-browser.md](references/cdp-browser.md) for login-gated site recipes (Xiaohongshu / Bilibili / Weibo / Zhihu / Feishu).
Read [references/jina-reader.md](references/jina-reader.md) for Jina Reader positioning, rate limits, and advanced endpoints.
@@ -321,7 +341,7 @@ Read [references/jina-reader.md](references/jina-reader.md) for Jina Reader posi
## Anti-Patterns (Avoid)
-**Using WebFetch on obviously heavy sites** — Medium, Twitter, 小红书 will waste tokens or fail. Jump straight to L2/L3.
-**Using WebFetch on obviously heavy sites** — Medium, Twitter, Xiaohongshu will waste tokens or fail. Jump straight to L2/L3.
-**Launching headless Chrome instead of CDP attach** — loses user's login state, triggers anti-bot, slow cold start. Always use `connect_over_cdp()` to attach to the user's existing session.
-**Fetching one URL at a time when you need 5** — batch in a single message.
-**Trusting a single source** — cross-check ≥ 2 sources for non-trivial claims.
@@ -336,20 +356,20 @@ Read [references/jina-reader.md](references/jina-reader.md) for Jina Reader posi
## Example Interaction
**User**: "帮我抓一下这条小红书笔记的内容:https://www.xiaohongshu.com/explore/abc123"
**User**: "Grab the contents of this Xiaohongshu note for me: https://www.xiaohongshu.com/explore/abc123"
**Agent workflow**:
```
1. 识别 → 小红书是 L3 登录态站点
2. 检查 CDPcurl -s http://localhost:9222/json/version
├─ 失败 → 提示用户启动 Chrome 调试模式,终止
└─ 成功 → 继续
3. Bash: python3 connect_over_cdp 脚本 → page.goto(url) → page.content()
4. BeautifulSoup 提取 h1 title.note-content.comments
5. 返回给用户时:
- 引用原 URL
- 若内容很长,用 Jina 清洗一遍节省 token
6. 告知用户:「已通过你的登录态抓取,原链接:[xhs](url)
1. Recognize → Xiaohongshu is an L3 logged-in site
2. Check CDP: curl -s http://localhost:9222/json/version
├─ Failure → prompt the user to launch Chrome in debug mode, abort
└─ Success → continue
3. Bash: python3 connect_over_cdp script → page.goto(url) → page.content()
4. BeautifulSoup extract h1 title, .note-content, .comments
5. When returning to the user:
- Cite the original URL
- If content is long, run it through Jina to save tokens
6. Tell the user: "Fetched via your logged-in session, original link: [xhs](url)"
```
---