mirror of
https://git.openapi.site/https://github.com/desirecore/market.git
synced 2026-06-06 09:30:42 +08:00
feat: skills i18n 改造(schemaVersion 1.1,零向后兼容) (#1)
* feat: skills i18n 改造 — schemaVersion 1.1,零向后兼容
把 21 个 skills + 1 个 agent + manifest/categories 全量迁移到 schemaVersion 1.1
的 i18n 结构,配套 CI AI 翻译流水线(GitHub Models)与本地工具链。
## 关键变更
### 数据结构(破坏性,schemaVersion 1.0 → 1.1)
- SKILL.md: 顶层 name 改为 ASCII slug(== 目录名,符合 agentskills.io 规范);
中文显示名/short_desc/description 全部迁入 metadata.i18n.<locale>
- agents/<id>/agent.json: shortDesc/fullDesc/tags/persona.{role,traits} 迁入
i18n.<locale>;changelog[].changes 改为 { <locale>: string[] } 对象
- categories.json: 每个分类的 label/description 迁入 i18n.<locale>,顶层只剩
color/icon
- manifest.json: 加 supportedLocales / defaultLocale;顶层 description 迁入
i18n.<locale>
### Body 文件结构
- 根 SKILL.md = frontmatter + default_locale (en-US) body
- SKILL.<locale>.md = 各 locale 的 markdown body(首行 <!-- locale: xx --> 自校验)
### 工具链(scripts/i18n/)
- glossary.json: zh→en 术语表 + do_not_translate 白名单
- schema/skill-frontmatter.schema.json: i18n frontmatter JSON Schema
- validate-i18n.py: 8 条校验规则(name 合规 / locale 完整性 / hash 一致性等)
- translate.py: GitHub Models / Anthropic 双 backend,sha256 增量翻译
- migrate.py: 一次性迁移脚本(旧格式 → i18n 结构)
### CI(.github/workflows/)
- i18n-validate.yml: PR 触发跑 validate + translate --check
- i18n-translate.yml: PR 触发用 GitHub Models(默认 openai/gpt-5-mini)翻译缺失
locale,自动追加 commit;可切到 ANTHROPIC_API_KEY 走 Claude
### 文档
- docs/I18N.md: 作者贡献指南(schema 说明 / 提交流程 / 常见问题)
- README.md: 加多语言段落
## 验证
- uv run scripts/i18n/validate-i18n.py: OK,49 文件 0 错误
- uv run scripts/i18n/translate.py --check: 0 stale locale
- 21 skills 标题数 zh-CN == en-US 严格对齐(最大 66=66)
- skills-ref 规范校验:全部通过(顶层 name ASCII slug + description 单字段)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(i18n): 修复 PR #1 review 反馈的 6 项问题
- schema: translated_by 正则放宽为 ^(human|ai:[A-Za-z0-9._:/-]+)$,接受
'ai:github:openai/gpt-5-mini' 这类 backend:model 形式(CI 翻译输出格式)
- README + docs/I18N.md: 修正"CI 用 Claude API"误导描述,正确说明默认是
GitHub Models(openai/gpt-5-mini)+ GITHUB_TOKEN,可选切到 Anthropic
- skills/minimax-tts/SKILL.md & SKILL.zh-CN.md: 删除多余的 ``` 闭合,避免
Markdown 后续渲染错乱
- skills/docx/SKILL.md: 翻译时丢失的 • Unicode escape 示例已恢复,
与 zh-CN 版本对齐
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1,5 +1,5 @@
|
||||
---
|
||||
name: 联网访问
|
||||
name: web-access
|
||||
description: >-
|
||||
Use this skill whenever the user needs to access information from the internet
|
||||
— searching for current information, fetching public web pages, browsing
|
||||
@@ -29,7 +29,28 @@ tags:
|
||||
- playwright
|
||||
metadata:
|
||||
author: desirecore
|
||||
updated_at: '2026-04-13'
|
||||
updated_at: '2026-05-03'
|
||||
i18n:
|
||||
default_locale: en-US
|
||||
source_locale: zh-CN
|
||||
locales:
|
||||
- zh-CN
|
||||
- en-US
|
||||
zh-CN:
|
||||
name: 联网访问
|
||||
short_desc: 联网搜索、网页抓取、登录态浏览器访问(CDP)、研究调研工作流
|
||||
description: 三层联网访问工具包——搜索公开页面、Jina 优化抓取、CDP 登录态浏览器访问。
|
||||
body: ./SKILL.zh-CN.md
|
||||
source_hash: sha256:0ba170b3126a0823
|
||||
translated_by: human
|
||||
en-US:
|
||||
name: Web Access
|
||||
short_desc: Web search, page fetching, logged-in browser access via CDP, research workflows
|
||||
description: A three-layer web-access toolkit — search public pages, fetch heavy pages via Jina Reader, and reach logged-in sites via Chrome CDP.
|
||||
body: ./SKILL.md
|
||||
source_hash: sha256:0ba170b3126a0823
|
||||
translated_by: ai:claude-opus-4-7
|
||||
translated_at: '2026-05-03'
|
||||
market:
|
||||
icon: >-
|
||||
<svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0
|
||||
@@ -46,7 +67,6 @@ market:
|
||||
stroke="#34C759" stroke-width="1.5" fill="#34C759"
|
||||
fill-opacity="0.12"/><path d="M20.5 20.5l2 2" stroke="#34C759"
|
||||
stroke-width="1.8" stroke-linecap="round"/></svg>
|
||||
short_desc: 联网搜索、网页抓取、登录态浏览器访问(CDP)、研究调研工作流
|
||||
category: research
|
||||
maintainer:
|
||||
name: DesireCore Official
|
||||
@@ -54,38 +74,38 @@ market:
|
||||
channel: latest
|
||||
---
|
||||
|
||||
# web-access 技能
|
||||
# web-access skill
|
||||
|
||||
## L0:一句话摘要
|
||||
## L0: One-line Summary
|
||||
|
||||
三层联网访问工具包——搜索公开页面、Jina 优化抓取、CDP 登录态浏览器访问。
|
||||
A three-layer web-access toolkit — search public pages, optimize fetches via Jina Reader, and reach login-gated sites via Chrome CDP.
|
||||
|
||||
## L1:概述与使用场景
|
||||
## L1: Overview & Use Cases
|
||||
|
||||
### 能力描述
|
||||
### Capability
|
||||
|
||||
web-access 是一个**流程型技能(Procedural Skill)**,提供三层互补的联网访问能力:Layer 1(WebSearch + WebFetch)用于公开页面;Layer 2(Jina Reader)用于 JS 渲染的重页面,默认节省 Token;Layer 3(Chrome CDP)用于需要登录态的站点(小红书/B站/微博/飞书/Twitter)。
|
||||
web-access is a **procedural skill** that provides three complementary layers of web access: Layer 1 (WebSearch + WebFetch) for public pages; Layer 2 (Jina Reader) for JS-rendered heavy pages, saving tokens by default; Layer 3 (Chrome CDP) for sites requiring a logged-in session (Xiaohongshu / Bilibili / Weibo / Feishu / Twitter).
|
||||
|
||||
### 使用场景
|
||||
### Use Cases
|
||||
|
||||
- 用户需要搜索当前信息或研究特定主题
|
||||
- 用户需要抓取公开网页内容或技术文档
|
||||
- 用户需要访问登录态站点(小红书、B站、微博、飞书、Twitter 等)
|
||||
- 用户需要对比产品、聚合新闻或调查 API/库版本
|
||||
- The user needs to search for current information or research a specific topic
|
||||
- The user needs to fetch public web content or technical documentation
|
||||
- The user needs to access logged-in sites (Xiaohongshu, Bilibili, Weibo, Feishu, Twitter, etc.)
|
||||
- The user needs to compare products, aggregate news, or investigate API/library versions
|
||||
|
||||
### 核心价值
|
||||
### Core Value
|
||||
|
||||
- **三层递进**:从轻量搜索到重度 JS 渲染到登录态访问,按需选择
|
||||
- **Token 优化**:Jina Reader 默认减少 50-80% Token 消耗
|
||||
- **登录态复用**:通过 CDP 连接用户已登录的 Chrome,无需重复登录
|
||||
- **Three-layer progression**: from lightweight search to heavy JS rendering to logged-in access — pick on demand
|
||||
- **Token optimization**: Jina Reader cuts token usage by 50–80% by default
|
||||
- **Logged-in session reuse**: connect to the user's already-logged-in Chrome via CDP — no re-login required
|
||||
|
||||
## L2:详细规范
|
||||
## L2: Detailed Specification
|
||||
|
||||
## Output Rule
|
||||
|
||||
When you complete a research task, you **MUST** cite all source URLs in your response. Distinguish between:
|
||||
- **Quoted facts**: directly from a fetched page → cite the URL
|
||||
- **Inferences**: your synthesis or analysis → mark as "(分析/推断)"
|
||||
- **Inferences**: your synthesis or analysis → mark as "(analysis/inference)"
|
||||
|
||||
If any fetch fails, explicitly tell the user which URL failed and which fallback you used.
|
||||
|
||||
@@ -93,7 +113,7 @@ If any fetch fails, explicitly tell the user which URL failed and which fallback
|
||||
|
||||
## Prerequisites: Chrome CDP Setup (for login-gated sites)
|
||||
|
||||
**Only required when accessing sites that need the user's login session** (小红书/B站/微博/飞书/Twitter/知乎/公众号).
|
||||
**Only required when accessing sites that need the user's login session** (Xiaohongshu / Bilibili / Weibo / Feishu / Twitter / Zhihu / WeChat Official Accounts).
|
||||
|
||||
### One-time setup
|
||||
|
||||
@@ -121,7 +141,7 @@ google-chrome \
|
||||
```
|
||||
|
||||
After launch:
|
||||
1. Manually log in to the sites you need (小红书、B站、微博、飞书 …)
|
||||
1. Manually log in to the sites you need (Xiaohongshu, Bilibili, Weibo, Feishu, …)
|
||||
2. Leave this Chrome window open in the background
|
||||
3. Verify the debug endpoint: `curl -s http://localhost:9222/json/version` should return JSON
|
||||
|
||||
@@ -132,7 +152,7 @@ Before any CDP operation, always run:
|
||||
curl -s http://localhost:9222/json/version | python3 -c "import sys,json; d=json.load(sys.stdin); print('CDP ready:', d.get('Browser'))"
|
||||
```
|
||||
|
||||
If the command fails, tell the user: "请先启动 Chrome 并开启远程调试端口(见 web-access 技能的 Prerequisites 部分)。"
|
||||
If the command fails, tell the user: "Please launch Chrome with the remote debugging port enabled (see the Prerequisites section of the web-access skill)."
|
||||
|
||||
---
|
||||
|
||||
@@ -151,7 +171,7 @@ User intent
|
||||
│ └─→ Bash: curl -sL "https://r.jina.ai/<original-url>"
|
||||
│ (Jina Reader = default for JS-rendered content, saves tokens)
|
||||
│
|
||||
├─ "Read this login-gated page" (小红书/B站/微博/飞书/Twitter/知乎/公众号)
|
||||
├─ "Read this login-gated page" (Xiaohongshu/Bilibili/Weibo/Feishu/Twitter/Zhihu/WeChat)
|
||||
│ └─→ 1. Verify CDP ready (curl http://localhost:9222/json/version)
|
||||
│ 2. Bash: python3 script with playwright.connect_over_cdp()
|
||||
│ 3. Extract content → feed to Jina Reader for clean Markdown
|
||||
@@ -188,13 +208,13 @@ User intent
|
||||
| Hacker News, Reddit | L1 WebFetch | Public content |
|
||||
| Medium, Dev.to | L2 Jina Reader | JS-rendered, member gates |
|
||||
| Twitter/X | L3 CDP (or L2 Jina with `x.com`) | Login required for full thread |
|
||||
| 小红书 (xiaohongshu.com) | L3 CDP | 强制登录 |
|
||||
| B站 (bilibili.com) | L3 CDP | 视频描述/评论需登录 |
|
||||
| 微博 (weibo.com) | L3 CDP | 长微博需登录 |
|
||||
| 知乎 (zhihu.com) | L3 CDP | 长文+评论需登录 |
|
||||
| 飞书文档 (feishu.cn) | L3 CDP | 必须登录 |
|
||||
| 公众号 (mp.weixin.qq.com) | L2 Jina Reader | 通常公开,Jina 处理更干净 |
|
||||
| LinkedIn | L3 CDP | 登录墙 |
|
||||
| Xiaohongshu (xiaohongshu.com) | L3 CDP | Login required |
|
||||
| Bilibili (bilibili.com) | L3 CDP | Login needed for video desc/comments |
|
||||
| Weibo (weibo.com) | L3 CDP | Long posts require login |
|
||||
| Zhihu (zhihu.com) | L3 CDP | Long articles + comments require login |
|
||||
| Feishu Docs (feishu.cn) | L3 CDP | Login required |
|
||||
| WeChat Official Accounts (mp.weixin.qq.com) | L2 Jina Reader | Usually public, Jina cleans better |
|
||||
| LinkedIn | L3 CDP | Login wall |
|
||||
|
||||
---
|
||||
|
||||
@@ -284,7 +304,7 @@ PY
|
||||
```
|
||||
|
||||
See [references/cdp-browser.md](references/cdp-browser.md) for:
|
||||
- Per-site selectors (小红书/B站/微博/知乎/飞书)
|
||||
- Per-site selectors (Xiaohongshu / Bilibili / Weibo / Zhihu / Feishu)
|
||||
- Scrolling & lazy-load patterns
|
||||
- Screenshot & form-fill recipes
|
||||
- Troubleshooting connection issues
|
||||
@@ -294,12 +314,12 @@ See [references/cdp-browser.md](references/cdp-browser.md) for:
|
||||
## Common Workflows
|
||||
|
||||
Read [references/workflows.md](references/workflows.md) for detailed templates:
|
||||
- 技术文档查询 (Tech docs lookup)
|
||||
- 竞品对比研究 (Competitor research)
|
||||
- 新闻聚合与时间线 (News aggregation)
|
||||
- API/库版本调查 (Library version investigation)
|
||||
- Tech docs lookup
|
||||
- Competitor research
|
||||
- News aggregation & timelines
|
||||
- API/library version investigation
|
||||
|
||||
Read [references/cdp-browser.md](references/cdp-browser.md) for login-gated site recipes (小红书/B站/微博/知乎/飞书).
|
||||
Read [references/cdp-browser.md](references/cdp-browser.md) for login-gated site recipes (Xiaohongshu / Bilibili / Weibo / Zhihu / Feishu).
|
||||
|
||||
Read [references/jina-reader.md](references/jina-reader.md) for Jina Reader positioning, rate limits, and advanced endpoints.
|
||||
|
||||
@@ -321,7 +341,7 @@ Read [references/jina-reader.md](references/jina-reader.md) for Jina Reader posi
|
||||
|
||||
## Anti-Patterns (Avoid)
|
||||
|
||||
- ❌ **Using WebFetch on obviously heavy sites** — Medium, Twitter, 小红书 will waste tokens or fail. Jump straight to L2/L3.
|
||||
- ❌ **Using WebFetch on obviously heavy sites** — Medium, Twitter, Xiaohongshu will waste tokens or fail. Jump straight to L2/L3.
|
||||
- ❌ **Launching headless Chrome instead of CDP attach** — loses user's login state, triggers anti-bot, slow cold start. Always use `connect_over_cdp()` to attach to the user's existing session.
|
||||
- ❌ **Fetching one URL at a time when you need 5** — batch in a single message.
|
||||
- ❌ **Trusting a single source** — cross-check ≥ 2 sources for non-trivial claims.
|
||||
@@ -336,20 +356,20 @@ Read [references/jina-reader.md](references/jina-reader.md) for Jina Reader posi
|
||||
|
||||
## Example Interaction
|
||||
|
||||
**User**: "帮我抓一下这条小红书笔记的内容:https://www.xiaohongshu.com/explore/abc123"
|
||||
**User**: "Grab the contents of this Xiaohongshu note for me: https://www.xiaohongshu.com/explore/abc123"
|
||||
|
||||
**Agent workflow**:
|
||||
```
|
||||
1. 识别 → 小红书是 L3 登录态站点
|
||||
2. 检查 CDP:curl -s http://localhost:9222/json/version
|
||||
├─ 失败 → 提示用户启动 Chrome 调试模式,终止
|
||||
└─ 成功 → 继续
|
||||
3. Bash: python3 connect_over_cdp 脚本 → page.goto(url) → page.content()
|
||||
4. BeautifulSoup 提取 h1 title、.note-content、.comments
|
||||
5. 返回给用户时:
|
||||
- 引用原 URL
|
||||
- 若内容很长,用 Jina 清洗一遍节省 token
|
||||
6. 告知用户:「已通过你的登录态抓取,原链接:[xhs](url)」
|
||||
1. Recognize → Xiaohongshu is an L3 logged-in site
|
||||
2. Check CDP: curl -s http://localhost:9222/json/version
|
||||
├─ Failure → prompt the user to launch Chrome in debug mode, abort
|
||||
└─ Success → continue
|
||||
3. Bash: python3 connect_over_cdp script → page.goto(url) → page.content()
|
||||
4. BeautifulSoup extract h1 title, .note-content, .comments
|
||||
5. When returning to the user:
|
||||
- Cite the original URL
|
||||
- If content is long, run it through Jina to save tokens
|
||||
6. Tell the user: "Fetched via your logged-in session, original link: [xhs](url)"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
Reference in New Issue
Block a user