feat: skills i18n 改造（schemaVersion 1.1，零向后兼容） (#1)

* feat: skills i18n 改造 — schemaVersion 1.1，零向后兼容把 21 个 skills + 1 个 agent + manifest/categories 全量迁移到 schemaVersion 1.1 的 i18n 结构，配套 CI AI 翻译流水线（GitHub Models）与本地工具链。 ## 关键变更 ### 数据结构（破坏性，schemaVersion 1.0 → 1.1） - SKILL.md: 顶层 name 改为 ASCII slug（== 目录名，符合 agentskills.io 规范）；中文显示名/short_desc/description 全部迁入 metadata.i18n.<locale> - agents/<id>/agent.json: shortDesc/fullDesc/tags/persona.{role,traits} 迁入 i18n.<locale>；changelog[].changes 改为 { <locale>: string[] } 对象 - categories.json: 每个分类的 label/description 迁入 i18n.<locale>，顶层只剩 color/icon - manifest.json: 加 supportedLocales / defaultLocale；顶层 description 迁入 i18n.<locale> ### Body 文件结构 - 根 SKILL.md = frontmatter + default_locale (en-US) body - SKILL.<locale>.md = 各 locale 的 markdown body（首行  自校验） ### 工具链（scripts/i18n/） - glossary.json: zh→en 术语表 + do_not_translate 白名单 - schema/skill-frontmatter.schema.json: i18n frontmatter JSON Schema - validate-i18n.py: 8 条校验规则（name 合规 / locale 完整性 / hash 一致性等） - translate.py: GitHub Models / Anthropic 双 backend，sha256 增量翻译 - migrate.py: 一次性迁移脚本（旧格式 → i18n 结构） ### CI（.github/workflows/） - i18n-validate.yml: PR 触发跑 validate + translate --check - i18n-translate.yml: PR 触发用 GitHub Models（默认 openai/gpt-5-mini）翻译缺失 locale，自动追加 commit；可切到 ANTHROPIC_API_KEY 走 Claude ### 文档 - docs/I18N.md: 作者贡献指南（schema 说明 / 提交流程 / 常见问题） - README.md: 加多语言段落 ## 验证 - uv run scripts/i18n/validate-i18n.py: OK，49 文件 0 错误 - uv run scripts/i18n/translate.py --check: 0 stale locale - 21 skills 标题数 zh-CN == en-US 严格对齐（最大 66=66） - skills-ref 规范校验：全部通过（顶层 name ASCII slug + description 单字段） Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(i18n): 修复 PR #1 review 反馈的 6 项问题 - schema: translated_by 正则放宽为 ^(human|ai:[A-Za-z0-9._:/-]+)$，接受 'ai:github:openai/gpt-5-mini' 这类 backend:model 形式（CI 翻译输出格式） - README + docs/I18N.md: 修正"CI 用 Claude API"误导描述，正确说明默认是 GitHub Models（openai/gpt-5-mini）+ GITHUB_TOKEN，可选切到 Anthropic - skills/minimax-tts/SKILL.md & SKILL.zh-CN.md: 删除多余的 ``` 闭合，避免 Markdown 后续渲染错乱 - skills/docx/SKILL.md: 翻译时丢失的 • Unicode escape 示例已恢复，与 zh-CN 版本对齐 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-07-23 04:44:36 +08:00 · 2026-05-05 00:26:33 +08:00
parent 1c107a9344
commit 1f7c8b9673
59 changed files with 10533 additions and 2014 deletions
--- a/skills/web-access/SKILL.md
+++ b/skills/web-access/SKILL.md
@@ -1,5 +1,5 @@
 ---
-name: 联网访问
+name: web-access
 description: >-
  Use this skill whenever the user needs to access information from the internet
  — searching for current information, fetching public web pages, browsing
@@ -29,7 +29,28 @@ tags:
  - playwright
 metadata:
  author: desirecore
-  updated_at: '2026-04-13'
+  updated_at: '2026-05-03'
+  i18n:
+    default_locale: en-US
+    source_locale: zh-CN
+    locales:
+      - zh-CN
+      - en-US
+    zh-CN:
+      name: 联网访问
+      short_desc: 联网搜索、网页抓取、登录态浏览器访问（CDP）、研究调研工作流
+      description: 三层联网访问工具包——搜索公开页面、Jina 优化抓取、CDP 登录态浏览器访问。
+      body: ./SKILL.zh-CN.md
+      source_hash: sha256:0ba170b3126a0823
+      translated_by: human
+    en-US:
+      name: Web Access
+      short_desc: Web search, page fetching, logged-in browser access via CDP, research workflows
+      description: A three-layer web-access toolkit — search public pages, fetch heavy pages via Jina Reader, and reach logged-in sites via Chrome CDP.
+      body: ./SKILL.md
+      source_hash: sha256:0ba170b3126a0823
+      translated_by: ai:claude-opus-4-7
+      translated_at: '2026-05-03'
 market:
  icon: >-
    <svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0
@@ -46,7 +67,6 @@ market:
    stroke="#34C759" stroke-width="1.5" fill="#34C759"
    fill-opacity="0.12"/><path d="M20.5 20.5l2 2" stroke="#34C759"
    stroke-width="1.8" stroke-linecap="round"/></svg>
-  short_desc: 联网搜索、网页抓取、登录态浏览器访问（CDP）、研究调研工作流
  category: research
  maintainer:
    name: DesireCore Official
@@ -54,38 +74,38 @@ market:
  channel: latest
 ---

-# web-access 技能
+# web-access skill

-## L0：一句话摘要
+## L0: One-line Summary

-三层联网访问工具包——搜索公开页面、Jina 优化抓取、CDP 登录态浏览器访问。
+A three-layer web-access toolkit — search public pages, optimize fetches via Jina Reader, and reach login-gated sites via Chrome CDP.

-## L1：概述与使用场景
+## L1: Overview & Use Cases

-### 能力描述
+### Capability

-web-access 是一个**流程型技能（Procedural Skill）**，提供三层互补的联网访问能力：Layer 1（WebSearch + WebFetch）用于公开页面；Layer 2（Jina Reader）用于 JS 渲染的重页面，默认节省 Token；Layer 3（Chrome CDP）用于需要登录态的站点（小红书/B站/微博/飞书/Twitter）。
+web-access is a **procedural skill** that provides three complementary layers of web access: Layer 1 (WebSearch + WebFetch) for public pages; Layer 2 (Jina Reader) for JS-rendered heavy pages, saving tokens by default; Layer 3 (Chrome CDP) for sites requiring a logged-in session (Xiaohongshu / Bilibili / Weibo / Feishu / Twitter).

-### 使用场景
+### Use Cases

- 用户需要搜索当前信息或研究特定主题
- 用户需要抓取公开网页内容或技术文档
- 用户需要访问登录态站点（小红书、B站、微博、飞书、Twitter 等）
- 用户需要对比产品、聚合新闻或调查 API/库版本
+- The user needs to search for current information or research a specific topic
+- The user needs to fetch public web content or technical documentation
+- The user needs to access logged-in sites (Xiaohongshu, Bilibili, Weibo, Feishu, Twitter, etc.)
+- The user needs to compare products, aggregate news, or investigate API/library versions

-### 核心价值
+### Core Value

- **三层递进**：从轻量搜索到重度 JS 渲染到登录态访问，按需选择
- **Token 优化**：Jina Reader 默认减少 50-80% Token 消耗
- **登录态复用**：通过 CDP 连接用户已登录的 Chrome，无需重复登录
+- **Three-layer progression**: from lightweight search to heavy JS rendering to logged-in access — pick on demand
+- **Token optimization**: Jina Reader cuts token usage by 50–80% by default
+- **Logged-in session reuse**: connect to the user's already-logged-in Chrome via CDP — no re-login required

-## L2：详细规范
+## L2: Detailed Specification

 ## Output Rule

 When you complete a research task, you **MUST** cite all source URLs in your response. Distinguish between:
 - **Quoted facts**: directly from a fetched page → cite the URL
- **Inferences**: your synthesis or analysis → mark as "(分析/推断)"
+- **Inferences**: your synthesis or analysis → mark as "(analysis/inference)"

 If any fetch fails, explicitly tell the user which URL failed and which fallback you used.

@@ -93,7 +113,7 @@ If any fetch fails, explicitly tell the user which URL failed and which fallback

 ## Prerequisites: Chrome CDP Setup (for login-gated sites)

-**Only required when accessing sites that need the user's login session** (小红书/B站/微博/飞书/Twitter/知乎/公众号).
+**Only required when accessing sites that need the user's login session** (Xiaohongshu / Bilibili / Weibo / Feishu / Twitter / Zhihu / WeChat Official Accounts).

 ### One-time setup

@@ -121,7 +141,7 @@ google-chrome \
 ```

 After launch:
-1. Manually log in to the sites you need (小红书、B站、微博、飞书 …)
+1. Manually log in to the sites you need (Xiaohongshu, Bilibili, Weibo, Feishu, …)
 2. Leave this Chrome window open in the background
 3. Verify the debug endpoint: `curl -s http://localhost:9222/json/version` should return JSON

@@ -132,7 +152,7 @@ Before any CDP operation, always run:
 curl -s http://localhost:9222/json/version | python3 -c "import sys,json; d=json.load(sys.stdin); print('CDP ready:', d.get('Browser'))"
 ```

-If the command fails, tell the user: "请先启动 Chrome 并开启远程调试端口（见 web-access 技能的 Prerequisites 部分）。"
+If the command fails, tell the user: "Please launch Chrome with the remote debugging port enabled (see the Prerequisites section of the web-access skill)."

 ---

@@ -151,7 +171,7 @@ User intent
  │     └─→ Bash: curl -sL "https://r.jina.ai/<original-url>"
  │          (Jina Reader = default for JS-rendered content, saves tokens)
  │
-  ├─ "Read this login-gated page" (小红书/B站/微博/飞书/Twitter/知乎/公众号)
+  ├─ "Read this login-gated page" (Xiaohongshu/Bilibili/Weibo/Feishu/Twitter/Zhihu/WeChat)
  │     └─→ 1. Verify CDP ready (curl http://localhost:9222/json/version)
  │          2. Bash: python3 script with playwright.connect_over_cdp()
  │          3. Extract content → feed to Jina Reader for clean Markdown
@@ -188,13 +208,13 @@ User intent
 | Hacker News, Reddit | L1 WebFetch | Public content |
 | Medium, Dev.to | L2 Jina Reader | JS-rendered, member gates |
 | Twitter/X | L3 CDP (or L2 Jina with `x.com`) | Login required for full thread |
-| 小红书 (xiaohongshu.com) | L3 CDP | 强制登录 |
-| B站 (bilibili.com) | L3 CDP | 视频描述/评论需登录 |
-| 微博 (weibo.com) | L3 CDP | 长微博需登录 |
-| 知乎 (zhihu.com) | L3 CDP | 长文+评论需登录 |
-| 飞书文档 (feishu.cn) | L3 CDP | 必须登录 |
-| 公众号 (mp.weixin.qq.com) | L2 Jina Reader | 通常公开，Jina 处理更干净 |
-| LinkedIn | L3 CDP | 登录墙 |
+| Xiaohongshu (xiaohongshu.com) | L3 CDP | Login required |
+| Bilibili (bilibili.com) | L3 CDP | Login needed for video desc/comments |
+| Weibo (weibo.com) | L3 CDP | Long posts require login |
+| Zhihu (zhihu.com) | L3 CDP | Long articles + comments require login |
+| Feishu Docs (feishu.cn) | L3 CDP | Login required |
+| WeChat Official Accounts (mp.weixin.qq.com) | L2 Jina Reader | Usually public, Jina cleans better |
+| LinkedIn | L3 CDP | Login wall |

 ---

@@ -284,7 +304,7 @@ PY
 ```

 See [references/cdp-browser.md](references/cdp-browser.md) for:
- Per-site selectors (小红书/B站/微博/知乎/飞书)
+- Per-site selectors (Xiaohongshu / Bilibili / Weibo / Zhihu / Feishu)
 - Scrolling & lazy-load patterns
 - Screenshot & form-fill recipes
 - Troubleshooting connection issues
@@ -294,12 +314,12 @@ See [references/cdp-browser.md](references/cdp-browser.md) for:
 ## Common Workflows

 Read [references/workflows.md](references/workflows.md) for detailed templates:
- 技术文档查询 (Tech docs lookup)
- 竞品对比研究 (Competitor research)
- 新闻聚合与时间线 (News aggregation)
- API/库版本调查 (Library version investigation)
+- Tech docs lookup
+- Competitor research
+- News aggregation & timelines
+- API/library version investigation

-Read [references/cdp-browser.md](references/cdp-browser.md) for login-gated site recipes (小红书/B站/微博/知乎/飞书).
+Read [references/cdp-browser.md](references/cdp-browser.md) for login-gated site recipes (Xiaohongshu / Bilibili / Weibo / Zhihu / Feishu).

 Read [references/jina-reader.md](references/jina-reader.md) for Jina Reader positioning, rate limits, and advanced endpoints.

@@ -321,7 +341,7 @@ Read [references/jina-reader.md](references/jina-reader.md) for Jina Reader posi

 ## Anti-Patterns (Avoid)

- ❌ **Using WebFetch on obviously heavy sites** — Medium, Twitter, 小红书 will waste tokens or fail. Jump straight to L2/L3.
+- ❌ **Using WebFetch on obviously heavy sites** — Medium, Twitter, Xiaohongshu will waste tokens or fail. Jump straight to L2/L3.
 - ❌ **Launching headless Chrome instead of CDP attach** — loses user's login state, triggers anti-bot, slow cold start. Always use `connect_over_cdp()` to attach to the user's existing session.
 - ❌ **Fetching one URL at a time when you need 5** — batch in a single message.
 - ❌ **Trusting a single source** — cross-check ≥ 2 sources for non-trivial claims.
@@ -336,20 +356,20 @@ Read [references/jina-reader.md](references/jina-reader.md) for Jina Reader posi

 ## Example Interaction

-**User**: "帮我抓一下这条小红书笔记的内容：https://www.xiaohongshu.com/explore/abc123"
+**User**: "Grab the contents of this Xiaohongshu note for me: https://www.xiaohongshu.com/explore/abc123"

 **Agent workflow**:
 ```
-1. 识别 → 小红书是 L3 登录态站点
-2. 检查 CDP：curl -s http://localhost:9222/json/version
-   ├─ 失败 → 提示用户启动 Chrome 调试模式，终止
-   └─ 成功 → 继续
-3. Bash: python3 connect_over_cdp 脚本 → page.goto(url) → page.content()
-4. BeautifulSoup 提取 h1 title、.note-content、.comments
-5. 返回给用户时：
-   - 引用原 URL
-   - 若内容很长，用 Jina 清洗一遍节省 token
-6. 告知用户：「已通过你的登录态抓取，原链接：[xhs](url)」
+1. Recognize → Xiaohongshu is an L3 logged-in site
+2. Check CDP: curl -s http://localhost:9222/json/version
+   ├─ Failure → prompt the user to launch Chrome in debug mode, abort
+   └─ Success → continue
+3. Bash: python3 connect_over_cdp script → page.goto(url) → page.content()
+4. BeautifulSoup extract h1 title, .note-content, .comments
+5. When returning to the user:
+   - Cite the original URL
+   - If content is long, run it through Jina to save tokens
+6. Tell the user: "Fetched via your logged-in session, original link: [xhs](url)"
 ```

 ---
--- a/skills/web-access/SKILL.zh-CN.md
+++ b/skills/web-access/SKILL.zh-CN.md
@@ -0,0 +1,312 @@
+<!-- locale: zh-CN -->
+
+# web-access 技能
+
+## L0：一句话摘要
+
+三层联网访问工具包——搜索公开页面、Jina 优化抓取、CDP 登录态浏览器访问。
+
+## L1：概述与使用场景
+
+### 能力描述
+
+web-access 是一个**流程型技能（Procedural Skill）**，提供三层互补的联网访问能力：Layer 1（WebSearch + WebFetch）用于公开页面；Layer 2（Jina Reader）用于 JS 渲染的重页面，默认节省 Token；Layer 3（Chrome CDP）用于需要登录态的站点（小红书/B站/微博/飞书/Twitter）。
+
+### 使用场景
+
+- 用户需要搜索当前信息或研究特定主题
+- 用户需要抓取公开网页内容或技术文档
+- 用户需要访问登录态站点（小红书、B站、微博、飞书、Twitter 等）
+- 用户需要对比产品、聚合新闻或调查 API/库版本
+
+### 核心价值
+
+- **三层递进**：从轻量搜索到重度 JS 渲染到登录态访问，按需选择
+- **Token 优化**：Jina Reader 默认减少 50-80% Token 消耗
+- **登录态复用**：通过 CDP 连接用户已登录的 Chrome，无需重复登录
+
+## L2：详细规范
+
+## Output Rule
+
+When you complete a research task, you **MUST** cite all source URLs in your response. Distinguish between:
+- **Quoted facts**: directly from a fetched page → cite the URL
+- **Inferences**: your synthesis or analysis → mark as "(分析/推断)"
+
+If any fetch fails, explicitly tell the user which URL failed and which fallback you used.
+
+---
+
+## Prerequisites: Chrome CDP Setup (for login-gated sites)
+
+**Only required when accessing sites that need the user's login session** (小红书/B站/微博/飞书/Twitter/知乎/公众号).
+
+### One-time setup
+
+Launch a dedicated Chrome instance with remote debugging enabled:
+
+**macOS**:
+```bash
+/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
+  --remote-debugging-port=9222 \
+  --user-data-dir="$HOME/.desirecore/chrome-profile"
+```
+
+**Linux**:
+```bash
+google-chrome \
+  --remote-debugging-port=9222 \
+  --user-data-dir="$HOME/.desirecore/chrome-profile"
+```
+
+**Windows (PowerShell)**:
+```powershell
+& "C:\Program Files\Google\Chrome\Application\chrome.exe" `
+  --remote-debugging-port=9222 `
+  --user-data-dir="$env:USERPROFILE\.desirecore\chrome-profile"
+```
+
+After launch:
+1. Manually log in to the sites you need (小红书、B站、微博、飞书 …)
+2. Leave this Chrome window open in the background
+3. Verify the debug endpoint: `curl -s http://localhost:9222/json/version` should return JSON
+
+### Verify CDP is ready
+
+Before any CDP operation, always run:
+```bash
+curl -s http://localhost:9222/json/version | python3 -c "import sys,json; d=json.load(sys.stdin); print('CDP ready:', d.get('Browser'))"
+```
+
+If the command fails, tell the user: "请先启动 Chrome 并开启远程调试端口（见 web-access 技能的 Prerequisites 部分）。"
+
+---
+
+## Tool Selection Decision Tree
+
+```
+User intent
+  │
+  ├─ "Search for information about X" (no specific URL)
+  │     └─→ WebSearch → pick top 3-5 results → fetch each (see next branches)
+  │
+  ├─ "Read this public page" (static HTML, docs, news)
+  │     └─→ WebFetch(url) directly
+  │
+  ├─ "Read this heavy-JS page" (SPA, React/Vue sites, Medium, etc.)
+  │     └─→ Bash: curl -sL "https://r.jina.ai/<original-url>"
+  │          (Jina Reader = default for JS-rendered content, saves tokens)
+  │
+  ├─ "Read this login-gated page" (小红书/B站/微博/飞书/Twitter/知乎/公众号)
+  │     └─→ 1. Verify CDP ready (curl http://localhost:9222/json/version)
+  │          2. Bash: python3 script with playwright.connect_over_cdp()
+  │          3. Extract content → feed to Jina Reader for clean Markdown
+  │             (or use BeautifulSoup directly on the raw HTML)
+  │
+  ├─ "API documentation / GitHub / npm package info"
+  │     └─→ Prefer official API endpoints over scraping HTML:
+  │          - GitHub: gh api repos/owner/name
+  │          - npm:    curl https://registry.npmjs.org/<pkg>
+  │          - PyPI:   curl https://pypi.org/pypi/<pkg>/json
+  │
+  └─ "Real-time interactive task" (click, fill form, scroll, screenshot)
+        └─→ CDP + Playwright (see references/cdp-browser.md)
+```
+
+### Three-layer strategy summary
+
+| Layer | Use case | Primary tool | Token cost |
+|-------|----------|--------------|------------|
+| L1 | Public, static | `WebFetch` | Low |
+| L2 | JS-heavy, long articles, token savings | `Bash curl r.jina.ai` | **Lowest** (Markdown pre-cleaned) |
+| L3 | Login-gated, interactive | `Bash + Python Playwright CDP` | Medium (raw HTML, then clean via Jina or BS4) |
+
+**Default priority**: L1 for simple public pages → L2 for anything heavy → L3 only when login is required.
+
+---
+
+## Supported Sites Matrix
+
+| Site | Recommended Layer | Notes |
+|------|-------------------|-------|
+| Wikipedia, MDN, official docs | L1 WebFetch | Static, clean HTML |
+| GitHub README, issues, PRs | `gh api` (best) → L1 WebFetch | Prefer API |
+| Hacker News, Reddit | L1 WebFetch | Public content |
+| Medium, Dev.to | L2 Jina Reader | JS-rendered, member gates |
+| Twitter/X | L3 CDP (or L2 Jina with `x.com`) | Login required for full thread |
+| 小红书 (xiaohongshu.com) | L3 CDP | 强制登录 |
+| B站 (bilibili.com) | L3 CDP | 视频描述/评论需登录 |
+| 微博 (weibo.com) | L3 CDP | 长微博需登录 |
+| 知乎 (zhihu.com) | L3 CDP | 长文+评论需登录 |
+| 飞书文档 (feishu.cn) | L3 CDP | 必须登录 |
+| 公众号 (mp.weixin.qq.com) | L2 Jina Reader | 通常公开，Jina 处理更干净 |
+| LinkedIn | L3 CDP | 登录墙 |
+
+---
+
+## Tool Reference
+
+### Layer 1: WebSearch + WebFetch
+
+**WebSearch** — discover URLs for an unknown topic:
+```
+WebSearch(query="latest typescript 5.5 features 2026", max_results=5)
+```
+
+Tips:
+- Include the year for time-sensitive topics
+- Use `allowed_domains` / `blocked_domains` to constrain
+
+**WebFetch** — extract clean Markdown from a known URL:
+```
+WebFetch(url="https://example.com/article")
+```
+
+Tips:
+- Results cached for 15 min
+- Returns cleaned Markdown with title + URL + body
+- If body < 200 chars or looks garbled → escalate to Layer 2 (Jina) or Layer 3 (CDP)
+
+### Layer 2: Jina Reader (default for heavy pages)
+
+Jina Reader (`r.jina.ai`) is a free public proxy that renders pages server-side and returns clean Markdown. Use it as the **default** for any page where WebFetch produces garbled or truncated output, and as the **preferred** extractor for JS-heavy SPAs.
+
+```bash
+curl -sL "https://r.jina.ai/https://example.com/article"
+```
+
+Why Jina is the default token-saver:
+- Strips nav/footer/ads automatically
+- Handles JS-rendered SPAs
+- Returns 50-80% fewer tokens than raw HTML
+- No API key needed for basic use (~20 req/min)
+
+See [references/jina-reader.md](references/jina-reader.md) for advanced endpoints and rate limits.
+
+### Layer 3: CDP Browser (login-gated access)
+
+Use Python Playwright's `connect_over_cdp()` to attach to the user's running Chrome (which already has login cookies). **No re-login needed.**
+
+**Minimal template**:
+```bash
+python3 << 'PY'
+from playwright.sync_api import sync_playwright
+
+TARGET_URL = "https://www.xiaohongshu.com/explore/..."
+
+with sync_playwright() as p:
+    browser = p.chromium.connect_over_cdp("http://localhost:9222")
+    context = browser.contexts[0]  # reuse user's default context (has cookies)
+    page = context.new_page()
+    page.goto(TARGET_URL, wait_until="domcontentloaded")
+    page.wait_for_timeout(2000)  # let lazy content load
+    html = page.content()
+    page.close()
+
+# Print first 500 chars to verify
+print(html[:500])
+PY
+```
+
+**Extract text via BeautifulSoup** (no Jina round-trip):
+```bash
+python3 << 'PY'
+from playwright.sync_api import sync_playwright
+from bs4 import BeautifulSoup
+
+with sync_playwright() as p:
+    browser = p.chromium.connect_over_cdp("http://localhost:9222")
+    page = browser.contexts[0].new_page()
+    page.goto("https://www.bilibili.com/video/BV...", wait_until="networkidle")
+    html = page.content()
+    page.close()
+
+soup = BeautifulSoup(html, "html.parser")
+title = soup.select_one("h1.video-title")
+desc = soup.select_one(".video-desc")
+print("Title:", title.get_text(strip=True) if title else "N/A")
+print("Desc:",  desc.get_text(strip=True)  if desc  else "N/A")
+PY
+```
+
+See [references/cdp-browser.md](references/cdp-browser.md) for:
+- Per-site selectors (小红书/B站/微博/知乎/飞书)
+- Scrolling & lazy-load patterns
+- Screenshot & form-fill recipes
+- Troubleshooting connection issues
+
+---
+
+## Common Workflows
+
+Read [references/workflows.md](references/workflows.md) for detailed templates:
+- 技术文档查询 (Tech docs lookup)
+- 竞品对比研究 (Competitor research)
+- 新闻聚合与时间线 (News aggregation)
+- API/库版本调查 (Library version investigation)
+
+Read [references/cdp-browser.md](references/cdp-browser.md) for login-gated site recipes (小红书/B站/微博/知乎/飞书).
+
+Read [references/jina-reader.md](references/jina-reader.md) for Jina Reader positioning, rate limits, and advanced endpoints.
+
+---
+
+## Quick Workflow: Multi-Source Research
+
+```
+1. WebSearch(query) → 5 candidate URLs
+2. Skim titles + snippets → pick 3 most relevant
+3. Classify each URL by layer (L1 / L2 / L3)
+4. Fetch all in parallel (single message, multiple tool calls)
+5. If any fetch returns < 200 chars or garbled → retry via next layer
+6. Synthesize: contradictions? consensus? outliers?
+7. Report with inline [source](url) citations + a Sources list at the end
+```
+
+---
+
+## Anti-Patterns (Avoid)
+
+- ❌ **Using WebFetch on obviously heavy sites** — Medium, Twitter, 小红书 will waste tokens or fail. Jump straight to L2/L3.
+- ❌ **Launching headless Chrome instead of CDP attach** — loses user's login state, triggers anti-bot, slow cold start. Always use `connect_over_cdp()` to attach to the user's existing session.
+- ❌ **Fetching one URL at a time when you need 5** — batch in a single message.
+- ❌ **Trusting a single source** — cross-check ≥ 2 sources for non-trivial claims.
+- ❌ **Fetching the search result page itself** — WebSearch already returns snippets; fetch the actual articles.
+- ❌ **Ignoring the cache** — WebFetch caches 15 min, reuse freely.
+- ❌ **Scraping when an API exists** — GitHub, npm, PyPI, Wikipedia all have JSON APIs.
+- ❌ **Forgetting the year in time-sensitive queries** — "best AI models" returns 2023 results; "best AI models 2026" returns current.
+- ❌ **Hardcoding login credentials in scripts** — always rely on the user's pre-logged CDP session.
+- ❌ **Citing only after the fact** — collect URLs as you fetch, not from memory afterwards.
+
+---
+
+## Example Interaction
+
+**User**: "帮我抓一下这条小红书笔记的内容：https://www.xiaohongshu.com/explore/abc123"
+
+**Agent workflow**:
+```
+1. 识别 → 小红书是 L3 登录态站点
+2. 检查 CDP：curl -s http://localhost:9222/json/version
+   ├─ 失败 → 提示用户启动 Chrome 调试模式，终止
+   └─ 成功 → 继续
+3. Bash: python3 connect_over_cdp 脚本 → page.goto(url) → page.content()
+4. BeautifulSoup 提取 h1 title、.note-content、.comments
+5. 返回给用户时：
+   - 引用原 URL
+   - 若内容很长，用 Jina 清洗一遍节省 token
+6. 告知用户：「已通过你的登录态抓取，原链接：[xhs](url)」
+```
+
+---
+
+## Installation Note
+
+CDP features require Python + Playwright installed:
+
+```bash
+pip3 install playwright beautifulsoup4
+python3 -m playwright install chromium  # only needed if user hasn't installed Chrome
+```
+
+If `playwright` is not installed when the user requests a login-gated site, run the install commands in Bash and explain you're setting up the browser automation dependency.