feat: 新增 web-access 和 frontend-design 两个内置技能

根据 docs 推荐补齐 5 个内置技能中的 c) 和 e)： web-access v1.1.0: - 三层架构：L1 WebSearch/WebFetch + L2 Jina Reader + L3 CDP Browser - 添加 Chrome CDP 前置条件（macOS/Linux/Windows 启动命令） - 支持登录态访问小红书/B站/微博/知乎/飞书/Twitter/公众号 - Jina Reader 重新定位为默认 token 优化层（非兜底） - 新增 references/cdp-browser.md（Python Playwright 详细操作手册） - 触发词扩充：小红书、B站、微博、飞书、Twitter、推特、X、知乎、公众号 frontend-design v1.0.0: - 从 Claude Code 官方 frontend-design 技能适配 - 保留原版 bold aesthetic 设计理念 - 新增 Project Context Override 章节：在 DesireCore 主仓库内工作时自动遵循 3+2 色彩体系（Green/Blue/Purple + Orange/Red） - 添加 Output Rule 要求告知用户文件路径 builtin-skills.json: 12 → 14 skills
2026-07-23 05:23:40 +08:00 · 2026-04-07 15:35:59 +08:00
parent de6cfa0b23
commit 98322aa930
6 changed files with 1015 additions and 0 deletions
--- a/skills/web-access/references/cdp-browser.md
+++ b/skills/web-access/references/cdp-browser.md
@@ -0,0 +1,330 @@
+# CDP Browser Access — Login-Gated Sites Manual
+
+Detailed recipes for accessing sites that require the user's login session, via Chrome DevTools Protocol (CDP) + Python Playwright.
+
+**Precondition**: Chrome is already running with `--remote-debugging-port=9222` and the user has manually logged in to the target sites. See the main SKILL.md `Prerequisites` section for the launch command.
+
+---
+
+## Why CDP attach, not headless
+
+| Approach | Login state | Anti-bot | Speed | Cost |
+|----------|-------------|----------|-------|------|
+| Headless Playwright (new context) | ❌ Empty cookies | ❌ Flagged as bot | Slow cold start | Re-login pain |
+| `playwright.chromium.launch(headless=False)` | ❌ Fresh profile | ⚠ Sometimes flagged | Slow | Same |
+| **CDP attach (`connect_over_cdp`)** | ✅ User's real cookies | ✅ Looks human | Instant | Zero friction |
+
+**Rule**: For any login-gated site, always attach to the user's running Chrome.
+
+---
+
+## Core Template
+
+Every CDP script follows this shape:
+
+```python
+from playwright.sync_api import sync_playwright
+
+def fetch_with_cdp(url: str, wait_selector: str | None = None) -> str:
+    """Attach to user's Chrome via CDP, fetch URL, return HTML."""
+    with sync_playwright() as p:
+        browser = p.chromium.connect_over_cdp("http://localhost:9222")
+        # browser.contexts[0] is the user's default context (with cookies)
+        context = browser.contexts[0]
+        page = context.new_page()
+        try:
+            page.goto(url, wait_until="domcontentloaded", timeout=30000)
+            if wait_selector:
+                page.wait_for_selector(wait_selector, timeout=10000)
+            else:
+                page.wait_for_timeout(2000)  # generic settle
+            return page.content()
+        finally:
+            page.close()
+            # DO NOT call browser.close() — that would close the user's Chrome!
+
+if __name__ == "__main__":
+    html = fetch_with_cdp("https://example.com")
+    print(html[:1000])
+```
+
+**Critical**: Never call `browser.close()` when using CDP attach — you'd kill the user's Chrome. Only close the page you opened.
+
+---
+
+## Site Recipes
+
+### 小红书 (xiaohongshu.com)
+
+```python
+from playwright.sync_api import sync_playwright
+from bs4 import BeautifulSoup
+
+NOTE_URL = "https://www.xiaohongshu.com/explore/XXXXXXXX"
+
+with sync_playwright() as p:
+    browser = p.chromium.connect_over_cdp("http://localhost:9222")
+    page = browser.contexts[0].new_page()
+    page.goto(NOTE_URL, wait_until="domcontentloaded")
+    page.wait_for_selector("#detail-title", timeout=10000)
+    page.wait_for_timeout(1500)  # let images/comments load
+    html = page.content()
+    page.close()
+
+soup = BeautifulSoup(html, "html.parser")
+title = (soup.select_one("#detail-title") or {}).get_text(strip=True) if soup.select_one("#detail-title") else None
+desc  = (soup.select_one("#detail-desc") or {}).get_text(" ", strip=True) if soup.select_one("#detail-desc") else None
+author = soup.select_one(".author-wrapper .username")
+print("Title:",  title)
+print("Author:", author.get_text(strip=True) if author else None)
+print("Desc:",   desc)
+```
+
+**Selectors** (may drift over time — update if they fail):
+- Title: `#detail-title`
+- Description: `#detail-desc`
+- Author: `.author-wrapper .username`
+- Images: `.swiper-slide img`
+- Comments: `.parent-comment .content`
+
+### B站 (bilibili.com)
+
+```python
+from playwright.sync_api import sync_playwright
+from bs4 import BeautifulSoup
+
+VIDEO_URL = "https://www.bilibili.com/video/BVxxxxxxxxx"
+
+with sync_playwright() as p:
+    browser = p.chromium.connect_over_cdp("http://localhost:9222")
+    page = browser.contexts[0].new_page()
+    page.goto(VIDEO_URL, wait_until="networkidle")
+    page.wait_for_timeout(2000)
+    html = page.content()
+    page.close()
+
+soup = BeautifulSoup(html, "html.parser")
+print("Title:", soup.select_one("h1.video-title").get_text(strip=True) if soup.select_one("h1.video-title") else None)
+print("UP:",    soup.select_one(".up-name").get_text(strip=True) if soup.select_one(".up-name") else None)
+print("Desc:",  soup.select_one(".desc-info-text").get_text(" ", strip=True) if soup.select_one(".desc-info-text") else None)
+```
+
+**Tip**: For B站 evaluations, the [公开 API](https://api.bilibili.com/x/web-interface/view?bvid=XXXX) often returns JSON without needing CDP. Try it first:
+
+```bash
+curl -s "https://api.bilibili.com/x/web-interface/view?bvid=BVxxxxxxxxx" | python3 -m json.tool
+```
+
+### 微博 (weibo.com)
+
+```python
+WEIBO_URL = "https://weibo.com/u/1234567890"  # or /detail/xxx
+
+# Same CDP template
+# Selectors:
+#   .Feed_body_3R0rO .detail_wbtext_4CRf9    — post text
+#   .ALink_default_2ibt1                      — user link
+#   article[aria-label="微博"]                 — each feed item
+```
+
+**Note**: Weibo uses React + heavy obfuscation. Selectors change frequently. If selectors fail, pipe the HTML through Jina for clean Markdown:
+
+```python
+html = fetch_with_cdp(WEIBO_URL)
+# Save to temp file, then:
+import subprocess
+result = subprocess.run(
+    ["curl", "-sL", f"https://r.jina.ai/{WEIBO_URL}"],
+    capture_output=True, text=True
+)
+print(result.stdout)
+```
+
+### 知乎 (zhihu.com)
+
+```python
+ANSWER_URL = "https://www.zhihu.com/question/123/answer/456"
+
+# Selectors:
+#   h1.QuestionHeader-title      — question title
+#   .RichContent-inner            — answer body
+#   .AuthorInfo-name              — author
+```
+
+Zhihu works with CDP but often also renders enough metadata server-side for Jina to work:
+
+```bash
+curl -sL "https://r.jina.ai/https://www.zhihu.com/question/123/answer/456"
+```
+
+Try Jina first, fall back to CDP if content is truncated.
+
+### 飞书文档 (feishu.cn / larksuite.com)
+
+```python
+DOC_URL = "https://xxx.feishu.cn/docs/xxx"
+
+# Feishu uses heavy virtualization — must scroll to load all content.
+# Recipe:
+
+from playwright.sync_api import sync_playwright
+
+with sync_playwright() as p:
+    browser = p.chromium.connect_over_cdp("http://localhost:9222")
+    page = browser.contexts[0].new_page()
+    page.goto(DOC_URL, wait_until="domcontentloaded")
+    page.wait_for_selector(".docs-render-unit", timeout=15000)
+
+    # Scroll to bottom repeatedly to load lazy content
+    last_height = 0
+    for _ in range(20):
+        page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
+        page.wait_for_timeout(800)
+        h = page.evaluate("document.body.scrollHeight")
+        if h == last_height:
+            break
+        last_height = h
+
+    # Extract text
+    text = page.evaluate("() => document.body.innerText")
+    page.close()
+
+print(text)
+```
+
+### Twitter / X
+
+```python
+TWEET_URL = "https://x.com/username/status/1234567890"
+
+# Selectors:
+#   article[data-testid="tweet"]         — tweet container
+#   div[data-testid="tweetText"]          — tweet text
+#   div[data-testid="User-Name"]          — author
+#   a[href$="/analytics"]                 — view count anchor (next sibling has stats)
+```
+
+Twitter is aggressive with anti-bot. CDP attach usually works, but set a generous wait:
+
+```python
+page.goto(url, wait_until="networkidle", timeout=45000)
+page.wait_for_selector('article[data-testid="tweet"]', timeout=15000)
+```
+
+---
+
+## Common Patterns
+
+### Pattern 1: Scroll to load lazy content
+
+```python
+def scroll_to_bottom(page, max_steps=30, pause_ms=800):
+    last = 0
+    for _ in range(max_steps):
+        page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
+        page.wait_for_timeout(pause_ms)
+        h = page.evaluate("document.body.scrollHeight")
+        if h == last:
+            return
+        last = h
+```
+
+### Pattern 2: Screenshot a specific element
+
+```python
+element = page.locator("article").first
+element.screenshot(path="/tmp/article.png")
+```
+
+### Pattern 3: Extract structured data via JavaScript
+
+```python
+data = page.evaluate("""() => {
+    const items = document.querySelectorAll('.list-item');
+    return Array.from(items).map(el => ({
+        title: el.querySelector('.title')?.innerText,
+        url:   el.querySelector('a')?.href,
+    }));
+}""")
+print(data)
+```
+
+### Pattern 4: Fill a form and click
+
+```python
+page.fill("input[name=q]", "search query")
+page.click("button[type=submit]")
+page.wait_for_load_state("networkidle")
+```
+
+### Pattern 5: Clean HTML via Jina after extraction
+
+When selectors are unreliable, dump the full page HTML and let Jina do the cleaning:
+
+```python
+html = page.content()
+# Save to file, serve via local HTTP, or just pipe the original URL:
+import subprocess
+clean_md = subprocess.run(
+    ["curl", "-sL", f"https://r.jina.ai/{url}"],
+    capture_output=True, text=True
+).stdout
+print(clean_md)
+```
+
+---
+
+## Troubleshooting
+
+### `connect_over_cdp` fails with `ECONNREFUSED`
+
+Chrome is not running with remote debugging. Tell the user:
+> "请先用下面的命令启动 Chrome：
+> `/Applications/Google\\ Chrome.app/Contents/MacOS/Google\\ Chrome --remote-debugging-port=9222 --user-data-dir=\"$HOME/.desirecore/chrome-profile\"`
+> 然后手动登录需要抓取的网站，再让我继续。"
+
+### `browser.contexts[0]` is empty
+
+Chrome was launched but no windows are open. Ask the user to open at least one tab and navigate anywhere.
+
+### Playwright not installed
+
+```bash
+pip3 install playwright beautifulsoup4
+# No need for `playwright install` — we're attaching to existing Chrome, not downloading a new browser
+```
+
+### Site detects automation
+
+Despite CDP attach, some sites (Cloudflare-protected, Instagram) may still detect automation. Options:
+1. Use Jina Reader instead (`curl -sL https://r.jina.ai/<url>`) — often succeeds where Playwright fails
+2. Ask the user to manually copy the visible content
+3. Use the site's public API if available
+
+### Content is truncated
+
+The page uses virtualization or lazy loading. Apply Pattern 1 (scroll to bottom) before calling `page.content()`.
+
+### `page.wait_for_selector` times out
+
+The selector is stale — the site updated its DOM. Dump `page.content()[:5000]` and inspect manually, or fall back to Jina Reader.
+
+---
+
+## Security Notes
+
+- **Never log or print cookies** from `context.cookies()` even during debugging
+- **Never extract and store** the user's session tokens to files
+- **Never use the CDP session** to perform writes (post, comment, like) unless the user explicitly requested it
+- The `~/.desirecore/chrome-profile` directory contains the user's credentials — treat it as sensitive
+- If the user asks to "log in automatically", refuse and explain they must log in manually in the Chrome window; the skill only reads already-authenticated sessions
+
+---
+
+## When NOT to use CDP
+
+- **Public static sites** → use L1 `WebFetch`, it's faster
+- **Heavy SPAs without login walls** → use L2 Jina Reader, it's cheaper on tokens
+- **You need thousands of pages** → CDP is not built for scale; look into proper scrapers
+
+CDP is specifically the "right tool" for: **small number of pages + login required + human-like behavior needed**.
--- a/skills/web-access/references/jina-reader.md
+++ b/skills/web-access/references/jina-reader.md
@@ -0,0 +1,122 @@
+# Jina Reader — Default Token-Optimization Layer
+
+[Jina Reader](https://jina.ai/reader) is a free public service that renders any URL server-side and returns clean Markdown. In this skill's three-layer architecture, **Jina is Layer 2: the default extractor for heavy/JS-rendered pages**, not just a fallback.
+
+---
+
+## Positioning in the three-layer model
+
+```
+L1 WebFetch            ── simple public static pages (docs, Wikipedia, HN)
+    │
+    │ WebFetch empty/truncated/garbled
+    ▼
+L2 Jina Reader         ── DEFAULT for JS-heavy SPAs, long articles, Medium, Twitter
+    │                     Strips nav/ads automatically, saves 50-80% tokens
+    │
+    │ Login required, or Jina also fails
+    ▼
+L3 CDP Browser         ── user's logged-in Chrome (小红书/B站/微博/飞书/Twitter)
+```
+
+**Key insight**: Don't wait for WebFetch to fail before trying Jina. For any URL you expect to be JS-heavy (any major SPA, Medium, Dev.to, long-form articles), go straight to Jina for the token savings.
+
+---
+
+## Basic Usage (no API key)
+
+```bash
+curl -sL "https://r.jina.ai/https://example.com/article"
+```
+
+The original URL goes after `r.jina.ai/`. The response is plain Markdown — pipe to a file or read directly.
+
+---
+
+## When to use each layer
+
+| Scenario | Primary choice | Why |
+|----------|---------------|-----|
+| Wikipedia, MDN, official docs | L1 WebFetch | Static clean HTML, fastest |
+| GitHub README (public) | L1 WebFetch | Simple markup |
+| Medium articles | **L2 Jina** | Member walls + heavy JS |
+| Dev.to, Hashnode | **L2 Jina** | JS-rendered |
+| Substack, Ghost blogs | **L2 Jina** | Partial JS rendering |
+| News sites with lazy-load | **L2 Jina** | Scroll-triggered content |
+| Twitter/X public threads | **L2 Jina** first, L3 CDP if truncated | Sometimes works |
+| 公众号 (mp.weixin.qq.com) | **L2 Jina** | Clean Markdown extraction |
+| LinkedIn articles | L3 CDP | Hard login wall |
+| 小红书, B站, 微博, 飞书 | L3 CDP | 登录强制 |
+
+---
+
+## Token savings example
+
+Raw HTML of a long Medium article: ~150 KB, ~50,000 tokens
+Same article via Jina Reader: ~20 KB, ~7,000 tokens
+
+**86% reduction**, with cleaner structure and no ads/nav cruft.
+
+---
+
+## Advanced Endpoints (optional)
+
+If you need more than basic content extraction, Jina also offers:
+
+- **Search**: `https://s.jina.ai/<query>` — returns top 5 results as Markdown
+- **Embeddings**: `https://api.jina.ai/v1/embeddings` (requires free API key)
+- **Reranker**: `https://api.jina.ai/v1/rerank` (requires free API key)
+
+For DesireCore, prefer the built-in `WebSearch` tool over `s.jina.ai` for consistency.
+
+---
+
+## Rate Limits
+
+- **Free tier**: ~20 requests/minute, no authentication needed
+- **With free API key**: higher limits, fewer throttles
+  ```bash
+  curl -sL "https://r.jina.ai/https://example.com" \
+       -H "Authorization: Bearer YOUR_KEY"
+  ```
+- Get a free key at [jina.ai](https://jina.ai) — stored in env var `JINA_API_KEY` if available
+
+---
+
+## Usage tips
+
+### Cache your own results
+Jina itself doesn't cache for you. If you call the same URL repeatedly in a session, save the Markdown to a temp file:
+
+```bash
+curl -sL "https://r.jina.ai/$URL" > /tmp/jina-cache.md
+```
+
+### Handle very long articles
+Jina returns the full article in one response. For articles > 50K chars, pipe through `head` or extract specific sections with Python/awk before feeding back to the model context.
+
+### Combine with CDP
+When you use L3 CDP to fetch a login-gated page, you can pipe the resulting HTML through Jina for clean Markdown instead of parsing with BeautifulSoup:
+
+```python
+html = fetch_with_cdp(url)  # from references/cdp-browser.md
+# Now convert via Jina (note: Jina fetches the URL itself, not your HTML)
+# So this only works if the content is already visible without login:
+import subprocess
+md = subprocess.run(["curl", "-sL", f"https://r.jina.ai/{url}"],
+                    capture_output=True, text=True).stdout
+```
+
+For truly login-gated content, you must parse the HTML directly (BeautifulSoup) since Jina can't log in on your behalf.
+
+---
+
+## Failure Mode
+
+If Jina Reader returns garbage or error:
+1. **Hard login wall** → escalate to L3 CDP browser
+2. **Geographically restricted** → tell the user, suggest VPN or manual access
+3. **Cloudflare challenge** → try L3 CDP (user's browser passes challenges naturally)
+4. **404 / gone** → confirm the URL is correct
+
+In all cases, tell the user explicitly which URL failed and what you tried.
--- a/skills/web-access/references/workflows.md
+++ b/skills/web-access/references/workflows.md
@@ -0,0 +1,136 @@
+# Common Research Workflows
+
+Reusable templates for multi-step research tasks. Adapt the queries and URLs to the specific topic.
+
+---
+
+## 1. Technical Documentation Lookup
+
+**Goal**: Find the authoritative answer to a "how do I X with library Y" question.
+
+```
+Step 1: WebSearch("<library> <feature> documentation site:<official-domain>")
+        ↓ if no results, drop the site: filter
+Step 2: WebFetch the top 1-2 official doc pages
+Step 3: If example code is incomplete, also fetch the GitHub README or examples folder:
+        Bash: gh api repos/<owner>/<repo>/contents/examples
+Step 4: Synthesize a concise answer with one runnable code block
+```
+
+**Tip**: Always check the doc version matches the user's installed version. Look for version selectors in the page.
+
+---
+
+## 2. Competitor / Product Comparison
+
+**Goal**: Build a structured comparison of 2-N similar products.
+
+```
+Step 1: WebSearch("<product-A> vs <product-B> comparison <year>")
+Step 2: WebSearch("<product-A> features pricing")  ─┐ parallel
+Step 3: WebSearch("<product-B> features pricing")  ─┘
+Step 4: WebFetch official pricing/features pages for each (parallel)
+Step 5: WebFetch 1 third-party comparison article (parallel)
+Step 6: Build markdown table with consistent dimensions:
+        | Dimension | Product A | Product B |
+        |-----------|-----------|-----------|
+        | Pricing   | ...       | ...       |
+        | Features  | ...       | ...       |
+        | License   | ...       | ...       |
+Step 7: Add a "Recommendation" paragraph based on user's stated needs
+```
+
+**Tip**: When dimensions differ between sources, prefer the official source over third-party.
+
+---
+
+## 3. News Aggregation & Timeline
+
+**Goal**: Build a chronological summary of recent events on a topic.
+
+```
+Step 1: WebSearch("<topic> news <year>", max_results=10)
+Step 2: Skim snippets, group by date
+Step 3: WebFetch the 3-5 most substantive articles (parallel)
+Step 4: Build timeline:
+        ## YYYY-MM-DD - Event headline
+        - Key fact 1 [source](url)
+        - Key fact 2 [source](url)
+Step 5: End with a "Current State" paragraph
+```
+
+**Tip**: Use `allowed_domains` to constrain to authoritative news sources if needed.
+
+---
+
+## 4. Library Version Investigation
+
+**Goal**: Find the latest version, breaking changes, and migration notes.
+
+```
+Step 1: Get latest version via API (faster than scraping):
+        Python: curl https://pypi.org/pypi/<package>/json | jq .info.version
+        Node:   curl https://registry.npmjs.org/<package>/latest | jq .version
+        Rust:   curl https://crates.io/api/v1/crates/<crate> | jq .crate.max_version
+
+Step 2: Get changelog:
+        gh api repos/<owner>/<repo>/releases/latest
+
+Step 3: If migration is needed, search:
+        WebSearch("<package> migration guide v<old> to v<new>")
+        WebFetch the official migration doc
+
+Step 4: Summarize: latest version, breaking changes (bullet list), 1-2 code diffs
+```
+
+---
+
+## 5. API Endpoint Discovery
+
+**Goal**: Find a specific API endpoint and its parameters.
+
+```
+Step 1: WebSearch("<service> API <action> reference")
+Step 2: WebFetch official API reference page
+Step 3: If response includes "Try it" / "Sandbox" link, mention it
+Step 4: Extract:
+        - Endpoint URL
+        - HTTP method
+        - Required headers (auth)
+        - Request body schema
+        - Response schema
+        - Example curl command
+Step 5: Format as a self-contained code block the user can copy-paste
+```
+
+---
+
+## 6. Quick Fact Check
+
+**Goal**: Verify a single specific claim.
+
+```
+Step 1: WebSearch("<exact claim phrase>")
+Step 2: If 2+ authoritative sources agree → confirmed
+Step 3: If sources disagree → report both sides + which is more authoritative
+Step 4: If no sources found → say "could not verify" — do NOT guess
+```
+
+**Tip**: For numeric facts, find the primary source (official report, paper) rather than secondary citations.
+
+---
+
+## Parallelization Cheat Sheet
+
+When you need multiple independent fetches, **always batch them in a single message with multiple tool calls** rather than sequentially. Examples:
+
+```
+✅ Single message with:
+   - WebFetch(url1)
+   - WebFetch(url2)
+   - WebFetch(url3)
+
+❌ Three separate messages, each with one WebFetch
+```
+
+This applies equally to WebSearch with different queries, and to mixed Search+Fetch when you have URLs from previous searches.