mirror of
https://git.openapi.site/https://github.com/desirecore/market.git
synced 2026-04-21 13:30:48 +08:00
feat: 新增 web-access 和 frontend-design 两个内置技能
根据 docs 推荐补齐 5 个内置技能中的 c) 和 e): web-access v1.1.0: - 三层架构:L1 WebSearch/WebFetch + L2 Jina Reader + L3 CDP Browser - 添加 Chrome CDP 前置条件(macOS/Linux/Windows 启动命令) - 支持登录态访问 小红书/B站/微博/知乎/飞书/Twitter/公众号 - Jina Reader 重新定位为默认 token 优化层(非兜底) - 新增 references/cdp-browser.md(Python Playwright 详细操作手册) - 触发词扩充:小红书、B站、微博、飞书、Twitter、推特、X、知乎、公众号 frontend-design v1.0.0: - 从 Claude Code 官方 frontend-design 技能适配 - 保留原版 bold aesthetic 设计理念 - 新增 Project Context Override 章节:在 DesireCore 主仓库内工作时 自动遵循 3+2 色彩体系(Green/Blue/Purple + Orange/Red) - 添加 Output Rule 要求告知用户文件路径 builtin-skills.json: 12 → 14 skills
This commit is contained in:
@@ -4,6 +4,7 @@
|
||||
"delete-agent",
|
||||
"discover-agent",
|
||||
"docx",
|
||||
"frontend-design",
|
||||
"manage-skills",
|
||||
"manage-teams",
|
||||
"pdf",
|
||||
@@ -11,6 +12,7 @@
|
||||
"s3-storage-operations",
|
||||
"skill-creator",
|
||||
"update-agent",
|
||||
"web-access",
|
||||
"xlsx"
|
||||
]
|
||||
}
|
||||
|
||||
91
skills/frontend-design/SKILL.md
Normal file
91
skills/frontend-design/SKILL.md
Normal file
@@ -0,0 +1,91 @@
|
||||
---
|
||||
name: 前端设计
|
||||
description: >-
|
||||
Create distinctive, production-grade frontend interfaces with high design
|
||||
quality. Use this skill when the user asks to build web components, pages,
|
||||
artifacts, posters, or applications (examples include websites, landing pages,
|
||||
dashboards, React components, HTML/CSS layouts, or when styling/beautifying
|
||||
any web UI). Generates creative, polished code and UI design that avoids
|
||||
generic AI aesthetics. Use when 用户提到 前端设计、网页设计、UI 设计、
|
||||
界面设计、组件、海报、Landing Page、落地页、React 组件、Vue 组件、
|
||||
CSS 样式、美化界面、设计一个、做一个网页、官网、仪表盘、Dashboard。
|
||||
license: Complete terms in LICENSE.txt
|
||||
version: 1.0.0
|
||||
type: procedural
|
||||
risk_level: low
|
||||
status: enabled
|
||||
disable-model-invocation: false
|
||||
tags:
|
||||
- frontend
|
||||
- design
|
||||
- ui
|
||||
- css
|
||||
- react
|
||||
- html
|
||||
metadata:
|
||||
author: anthropic
|
||||
updated_at: '2026-04-07'
|
||||
market:
|
||||
short_desc: 创建有品味、避免 AI 烂大街审美的前端界面与组件
|
||||
category: design
|
||||
maintainer:
|
||||
name: DesireCore Official
|
||||
verified: true
|
||||
channel: latest
|
||||
---
|
||||
|
||||
This skill guides creation of distinctive, production-grade frontend interfaces that avoid generic "AI slop" aesthetics. Implement real working code with exceptional attention to aesthetic details and creative choices.
|
||||
|
||||
The user provides frontend requirements: a component, page, application, or interface to build. They may include context about the purpose, audience, or technical constraints.
|
||||
|
||||
## Output Rule
|
||||
|
||||
When you create or modify HTML/CSS/JS/React/Vue files, you **MUST** tell the user the absolute path of the output file in your response. Example: "文件已保存到:`/path/to/index.html`"
|
||||
|
||||
If you create multiple files (e.g. HTML + CSS + JS), list each path explicitly.
|
||||
|
||||
## Design Thinking
|
||||
|
||||
Before coding, understand the context and commit to a BOLD aesthetic direction:
|
||||
- **Purpose**: What problem does this interface solve? Who uses it?
|
||||
- **Tone**: Pick an extreme: brutally minimal, maximalist chaos, retro-futuristic, organic/natural, luxury/refined, playful/toy-like, editorial/magazine, brutalist/raw, art deco/geometric, soft/pastel, industrial/utilitarian, etc. There are so many flavors to choose from. Use these for inspiration but design one that is true to the aesthetic direction.
|
||||
- **Constraints**: Technical requirements (framework, performance, accessibility).
|
||||
- **Differentiation**: What makes this UNFORGETTABLE? What's the one thing someone will remember?
|
||||
|
||||
**CRITICAL**: Choose a clear conceptual direction and execute it with precision. Bold maximalism and refined minimalism both work - the key is intentionality, not intensity.
|
||||
|
||||
Then implement working code (HTML/CSS/JS, React, Vue, etc.) that is:
|
||||
- Production-grade and functional
|
||||
- Visually striking and memorable
|
||||
- Cohesive with a clear aesthetic point-of-view
|
||||
- Meticulously refined in every detail
|
||||
|
||||
## Frontend Aesthetics Guidelines
|
||||
|
||||
Focus on:
|
||||
- **Typography**: Choose fonts that are beautiful, unique, and interesting. Avoid generic fonts like Arial and Inter; opt instead for distinctive choices that elevate the frontend's aesthetics; unexpected, characterful font choices. Pair a distinctive display font with a refined body font.
|
||||
- **Color & Theme**: Commit to a cohesive aesthetic. Use CSS variables for consistency. Dominant colors with sharp accents outperform timid, evenly-distributed palettes.
|
||||
- **Motion**: Use animations for effects and micro-interactions. Prioritize CSS-only solutions for HTML. Use Motion library for React when available. Focus on high-impact moments: one well-orchestrated page load with staggered reveals (animation-delay) creates more delight than scattered micro-interactions. Use scroll-triggering and hover states that surprise.
|
||||
- **Spatial Composition**: Unexpected layouts. Asymmetry. Overlap. Diagonal flow. Grid-breaking elements. Generous negative space OR controlled density.
|
||||
- **Backgrounds & Visual Details**: Create atmosphere and depth rather than defaulting to solid colors. Add contextual effects and textures that match the overall aesthetic. Apply creative forms like gradient meshes, noise textures, geometric patterns, layered transparencies, dramatic shadows, decorative borders, custom cursors, and grain overlays.
|
||||
|
||||
NEVER use generic AI-generated aesthetics like overused font families (Inter, Roboto, Arial, system fonts), cliched color schemes (particularly purple gradients on white backgrounds), predictable layouts and component patterns, and cookie-cutter design that lacks context-specific character.
|
||||
|
||||
Interpret creatively and make unexpected choices that feel genuinely designed for the context. No design should be the same. Vary between light and dark themes, different fonts, different aesthetics. NEVER converge on common choices (Space Grotesk, for example) across generations.
|
||||
|
||||
**IMPORTANT**: Match implementation complexity to the aesthetic vision. Maximalist designs need elaborate code with extensive animations and effects. Minimalist or refined designs need restraint, precision, and careful attention to spacing, typography, and subtle details. Elegance comes from executing the vision well.
|
||||
|
||||
Remember: Claude is capable of extraordinary creative work. Don't hold back, show what can truly be created when thinking outside the box and committing fully to a distinctive vision.
|
||||
|
||||
---
|
||||
|
||||
## Project Context Override (DesireCore Specific)
|
||||
|
||||
> **Note**: When working **inside the DesireCore main repository** (`desirecore-9` or any project that has `app/styles/globals.css` with the DesireCore 3+2 token system), the project's strict design system **OVERRIDES** the bold aesthetic guidance above. In that context:
|
||||
>
|
||||
> - Use only the 3 functional colors (Green / Blue / Purple) + 2 status colors (Orange / Red) defined in `globals.css`
|
||||
> - Reference design tokens via CSS variables (`var(--accent-green)`, etc.) — never hardcoded hex
|
||||
> - Follow the typography, radius, and spacing tokens already defined
|
||||
> - The "avoid generic AI aesthetics" principle still applies, but expression happens through layout/composition/motion, not color expansion
|
||||
>
|
||||
> For **standalone artifacts, posters, landing pages, or external projects**, the full aesthetic freedom of this skill applies — go bold.
|
||||
334
skills/web-access/SKILL.md
Normal file
334
skills/web-access/SKILL.md
Normal file
@@ -0,0 +1,334 @@
|
||||
---
|
||||
name: 联网访问
|
||||
description: >-
|
||||
Use this skill whenever the user needs to access information from the internet
|
||||
— searching for current information, fetching public web pages, browsing
|
||||
login-gated sites (微博/小红书/B站/飞书/Twitter), comparing products,
|
||||
researching topics, gathering documentation, or summarizing news.
|
||||
This skill orchestrates three complementary layers: (1) WebSearch + WebFetch
|
||||
for public pages, (2) Jina Reader as the default token-optimization layer for
|
||||
heavy/JS-rendered pages, and (3) Chrome DevTools Protocol (CDP) via Python
|
||||
Playwright for login-gated sites that require the user's existing browser
|
||||
session. Always cite source URLs. Use when 用户提到 联网搜索、上网查、
|
||||
查资料、抓取网页、研究、调研、最新资讯、文档查询、对比、竞品、技术文档、
|
||||
新闻、网址、URL、找一下、搜一下、查一下、小红书、B站、微博、飞书、Twitter、
|
||||
推特、X、知乎、公众号、已登录、登录状态。
|
||||
license: Complete terms in LICENSE.txt
|
||||
version: 1.1.0
|
||||
type: procedural
|
||||
risk_level: low
|
||||
status: enabled
|
||||
disable-model-invocation: false
|
||||
tags:
|
||||
- web
|
||||
- search
|
||||
- fetch
|
||||
- research
|
||||
- browsing
|
||||
- cdp
|
||||
- playwright
|
||||
metadata:
|
||||
author: desirecore
|
||||
updated_at: '2026-04-07'
|
||||
market:
|
||||
short_desc: 联网搜索、网页抓取、登录态浏览器访问(CDP)、研究调研工作流
|
||||
category: research
|
||||
maintainer:
|
||||
name: DesireCore Official
|
||||
verified: true
|
||||
channel: latest
|
||||
---
|
||||
|
||||
# Web Access Skill
|
||||
|
||||
Three-layer web access toolkit:
|
||||
|
||||
1. **Layer 1 — Search & Fetch**: `WebSearch` + `WebFetch` for public pages
|
||||
2. **Layer 2 — Jina Reader**: default token-optimized extraction for heavy/JS-rendered pages
|
||||
3. **Layer 3 — CDP Browser**: Chrome DevTools Protocol for login-gated sites (小红书/B站/微博/飞书/Twitter)
|
||||
|
||||
---
|
||||
|
||||
## Output Rule
|
||||
|
||||
When you complete a research task, you **MUST** cite all source URLs in your response. Distinguish between:
|
||||
- **Quoted facts**: directly from a fetched page → cite the URL
|
||||
- **Inferences**: your synthesis or analysis → mark as "(分析/推断)"
|
||||
|
||||
If any fetch fails, explicitly tell the user which URL failed and which fallback you used.
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites: Chrome CDP Setup (for login-gated sites)
|
||||
|
||||
**Only required when accessing sites that need the user's login session** (小红书/B站/微博/飞书/Twitter/知乎/公众号).
|
||||
|
||||
### One-time setup
|
||||
|
||||
Launch a dedicated Chrome instance with remote debugging enabled:
|
||||
|
||||
**macOS**:
|
||||
```bash
|
||||
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
|
||||
--remote-debugging-port=9222 \
|
||||
--user-data-dir="$HOME/.desirecore/chrome-profile"
|
||||
```
|
||||
|
||||
**Linux**:
|
||||
```bash
|
||||
google-chrome \
|
||||
--remote-debugging-port=9222 \
|
||||
--user-data-dir="$HOME/.desirecore/chrome-profile"
|
||||
```
|
||||
|
||||
**Windows (PowerShell)**:
|
||||
```powershell
|
||||
& "C:\Program Files\Google\Chrome\Application\chrome.exe" `
|
||||
--remote-debugging-port=9222 `
|
||||
--user-data-dir="$env:USERPROFILE\.desirecore\chrome-profile"
|
||||
```
|
||||
|
||||
After launch:
|
||||
1. Manually log in to the sites you need (小红书、B站、微博、飞书 …)
|
||||
2. Leave this Chrome window open in the background
|
||||
3. Verify the debug endpoint: `curl -s http://localhost:9222/json/version` should return JSON
|
||||
|
||||
### Verify CDP is ready
|
||||
|
||||
Before any CDP operation, always run:
|
||||
```bash
|
||||
curl -s http://localhost:9222/json/version | python3 -c "import sys,json; d=json.load(sys.stdin); print('CDP ready:', d.get('Browser'))"
|
||||
```
|
||||
|
||||
If the command fails, tell the user: "请先启动 Chrome 并开启远程调试端口(见 web-access 技能的 Prerequisites 部分)。"
|
||||
|
||||
---
|
||||
|
||||
## Tool Selection Decision Tree
|
||||
|
||||
```
|
||||
User intent
|
||||
│
|
||||
├─ "Search for information about X" (no specific URL)
|
||||
│ └─→ WebSearch → pick top 3-5 results → fetch each (see next branches)
|
||||
│
|
||||
├─ "Read this public page" (static HTML, docs, news)
|
||||
│ └─→ WebFetch(url) directly
|
||||
│
|
||||
├─ "Read this heavy-JS page" (SPA, React/Vue sites, Medium, etc.)
|
||||
│ └─→ Bash: curl -sL "https://r.jina.ai/<original-url>"
|
||||
│ (Jina Reader = default for JS-rendered content, saves tokens)
|
||||
│
|
||||
├─ "Read this login-gated page" (小红书/B站/微博/飞书/Twitter/知乎/公众号)
|
||||
│ └─→ 1. Verify CDP ready (curl http://localhost:9222/json/version)
|
||||
│ 2. Bash: python3 script with playwright.connect_over_cdp()
|
||||
│ 3. Extract content → feed to Jina Reader for clean Markdown
|
||||
│ (or use BeautifulSoup directly on the raw HTML)
|
||||
│
|
||||
├─ "API documentation / GitHub / npm package info"
|
||||
│ └─→ Prefer official API endpoints over scraping HTML:
|
||||
│ - GitHub: gh api repos/owner/name
|
||||
│ - npm: curl https://registry.npmjs.org/<pkg>
|
||||
│ - PyPI: curl https://pypi.org/pypi/<pkg>/json
|
||||
│
|
||||
└─ "Real-time interactive task" (click, fill form, scroll, screenshot)
|
||||
└─→ CDP + Playwright (see references/cdp-browser.md)
|
||||
```
|
||||
|
||||
### Three-layer strategy summary
|
||||
|
||||
| Layer | Use case | Primary tool | Token cost |
|
||||
|-------|----------|--------------|------------|
|
||||
| L1 | Public, static | `WebFetch` | Low |
|
||||
| L2 | JS-heavy, long articles, token savings | `Bash curl r.jina.ai` | **Lowest** (Markdown pre-cleaned) |
|
||||
| L3 | Login-gated, interactive | `Bash + Python Playwright CDP` | Medium (raw HTML, then clean via Jina or BS4) |
|
||||
|
||||
**Default priority**: L1 for simple public pages → L2 for anything heavy → L3 only when login is required.
|
||||
|
||||
---
|
||||
|
||||
## Supported Sites Matrix
|
||||
|
||||
| Site | Recommended Layer | Notes |
|
||||
|------|-------------------|-------|
|
||||
| Wikipedia, MDN, official docs | L1 WebFetch | Static, clean HTML |
|
||||
| GitHub README, issues, PRs | `gh api` (best) → L1 WebFetch | Prefer API |
|
||||
| Hacker News, Reddit | L1 WebFetch | Public content |
|
||||
| Medium, Dev.to | L2 Jina Reader | JS-rendered, member gates |
|
||||
| Twitter/X | L3 CDP (or L2 Jina with `x.com`) | Login required for full thread |
|
||||
| 小红书 (xiaohongshu.com) | L3 CDP | 强制登录 |
|
||||
| B站 (bilibili.com) | L3 CDP | 视频描述/评论需登录 |
|
||||
| 微博 (weibo.com) | L3 CDP | 长微博需登录 |
|
||||
| 知乎 (zhihu.com) | L3 CDP | 长文+评论需登录 |
|
||||
| 飞书文档 (feishu.cn) | L3 CDP | 必须登录 |
|
||||
| 公众号 (mp.weixin.qq.com) | L2 Jina Reader | 通常公开,Jina 处理更干净 |
|
||||
| LinkedIn | L3 CDP | 登录墙 |
|
||||
|
||||
---
|
||||
|
||||
## Tool Reference
|
||||
|
||||
### Layer 1: WebSearch + WebFetch
|
||||
|
||||
**WebSearch** — discover URLs for an unknown topic:
|
||||
```
|
||||
WebSearch(query="latest typescript 5.5 features 2026", max_results=5)
|
||||
```
|
||||
|
||||
Tips:
|
||||
- Include the year for time-sensitive topics
|
||||
- Use `allowed_domains` / `blocked_domains` to constrain
|
||||
|
||||
**WebFetch** — extract clean Markdown from a known URL:
|
||||
```
|
||||
WebFetch(url="https://example.com/article")
|
||||
```
|
||||
|
||||
Tips:
|
||||
- Results cached for 15 min
|
||||
- Returns cleaned Markdown with title + URL + body
|
||||
- If body < 200 chars or looks garbled → escalate to Layer 2 (Jina) or Layer 3 (CDP)
|
||||
|
||||
### Layer 2: Jina Reader (default for heavy pages)
|
||||
|
||||
Jina Reader (`r.jina.ai`) is a free public proxy that renders pages server-side and returns clean Markdown. Use it as the **default** for any page where WebFetch produces garbled or truncated output, and as the **preferred** extractor for JS-heavy SPAs.
|
||||
|
||||
```bash
|
||||
curl -sL "https://r.jina.ai/https://example.com/article"
|
||||
```
|
||||
|
||||
Why Jina is the default token-saver:
|
||||
- Strips nav/footer/ads automatically
|
||||
- Handles JS-rendered SPAs
|
||||
- Returns 50-80% fewer tokens than raw HTML
|
||||
- No API key needed for basic use (~20 req/min)
|
||||
|
||||
See [references/jina-reader.md](references/jina-reader.md) for advanced endpoints and rate limits.
|
||||
|
||||
### Layer 3: CDP Browser (login-gated access)
|
||||
|
||||
Use Python Playwright's `connect_over_cdp()` to attach to the user's running Chrome (which already has login cookies). **No re-login needed.**
|
||||
|
||||
**Minimal template**:
|
||||
```bash
|
||||
python3 << 'PY'
|
||||
from playwright.sync_api import sync_playwright
|
||||
|
||||
TARGET_URL = "https://www.xiaohongshu.com/explore/..."
|
||||
|
||||
with sync_playwright() as p:
|
||||
browser = p.chromium.connect_over_cdp("http://localhost:9222")
|
||||
context = browser.contexts[0] # reuse user's default context (has cookies)
|
||||
page = context.new_page()
|
||||
page.goto(TARGET_URL, wait_until="domcontentloaded")
|
||||
page.wait_for_timeout(2000) # let lazy content load
|
||||
html = page.content()
|
||||
page.close()
|
||||
|
||||
# Print first 500 chars to verify
|
||||
print(html[:500])
|
||||
PY
|
||||
```
|
||||
|
||||
**Extract text via BeautifulSoup** (no Jina round-trip):
|
||||
```bash
|
||||
python3 << 'PY'
|
||||
from playwright.sync_api import sync_playwright
|
||||
from bs4 import BeautifulSoup
|
||||
|
||||
with sync_playwright() as p:
|
||||
browser = p.chromium.connect_over_cdp("http://localhost:9222")
|
||||
page = browser.contexts[0].new_page()
|
||||
page.goto("https://www.bilibili.com/video/BV...", wait_until="networkidle")
|
||||
html = page.content()
|
||||
page.close()
|
||||
|
||||
soup = BeautifulSoup(html, "html.parser")
|
||||
title = soup.select_one("h1.video-title")
|
||||
desc = soup.select_one(".video-desc")
|
||||
print("Title:", title.get_text(strip=True) if title else "N/A")
|
||||
print("Desc:", desc.get_text(strip=True) if desc else "N/A")
|
||||
PY
|
||||
```
|
||||
|
||||
See [references/cdp-browser.md](references/cdp-browser.md) for:
|
||||
- Per-site selectors (小红书/B站/微博/知乎/飞书)
|
||||
- Scrolling & lazy-load patterns
|
||||
- Screenshot & form-fill recipes
|
||||
- Troubleshooting connection issues
|
||||
|
||||
---
|
||||
|
||||
## Common Workflows
|
||||
|
||||
Read [references/workflows.md](references/workflows.md) for detailed templates:
|
||||
- 技术文档查询 (Tech docs lookup)
|
||||
- 竞品对比研究 (Competitor research)
|
||||
- 新闻聚合与时间线 (News aggregation)
|
||||
- API/库版本调查 (Library version investigation)
|
||||
|
||||
Read [references/cdp-browser.md](references/cdp-browser.md) for login-gated site recipes (小红书/B站/微博/知乎/飞书).
|
||||
|
||||
Read [references/jina-reader.md](references/jina-reader.md) for Jina Reader positioning, rate limits, and advanced endpoints.
|
||||
|
||||
---
|
||||
|
||||
## Quick Workflow: Multi-Source Research
|
||||
|
||||
```
|
||||
1. WebSearch(query) → 5 candidate URLs
|
||||
2. Skim titles + snippets → pick 3 most relevant
|
||||
3. Classify each URL by layer (L1 / L2 / L3)
|
||||
4. Fetch all in parallel (single message, multiple tool calls)
|
||||
5. If any fetch returns < 200 chars or garbled → retry via next layer
|
||||
6. Synthesize: contradictions? consensus? outliers?
|
||||
7. Report with inline [source](url) citations + a Sources list at the end
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Anti-Patterns (Avoid)
|
||||
|
||||
- ❌ **Using WebFetch on obviously heavy sites** — Medium, Twitter, 小红书 will waste tokens or fail. Jump straight to L2/L3.
|
||||
- ❌ **Launching headless Chrome instead of CDP attach** — loses user's login state, triggers anti-bot, slow cold start. Always use `connect_over_cdp()` to attach to the user's existing session.
|
||||
- ❌ **Fetching one URL at a time when you need 5** — batch in a single message.
|
||||
- ❌ **Trusting a single source** — cross-check ≥ 2 sources for non-trivial claims.
|
||||
- ❌ **Fetching the search result page itself** — WebSearch already returns snippets; fetch the actual articles.
|
||||
- ❌ **Ignoring the cache** — WebFetch caches 15 min, reuse freely.
|
||||
- ❌ **Scraping when an API exists** — GitHub, npm, PyPI, Wikipedia all have JSON APIs.
|
||||
- ❌ **Forgetting the year in time-sensitive queries** — "best AI models" returns 2023 results; "best AI models 2026" returns current.
|
||||
- ❌ **Hardcoding login credentials in scripts** — always rely on the user's pre-logged CDP session.
|
||||
- ❌ **Citing only after the fact** — collect URLs as you fetch, not from memory afterwards.
|
||||
|
||||
---
|
||||
|
||||
## Example Interaction
|
||||
|
||||
**User**: "帮我抓一下这条小红书笔记的内容:https://www.xiaohongshu.com/explore/abc123"
|
||||
|
||||
**Agent workflow**:
|
||||
```
|
||||
1. 识别 → 小红书是 L3 登录态站点
|
||||
2. 检查 CDP:curl -s http://localhost:9222/json/version
|
||||
├─ 失败 → 提示用户启动 Chrome 调试模式,终止
|
||||
└─ 成功 → 继续
|
||||
3. Bash: python3 connect_over_cdp 脚本 → page.goto(url) → page.content()
|
||||
4. BeautifulSoup 提取 h1 title、.note-content、.comments
|
||||
5. 返回给用户时:
|
||||
- 引用原 URL
|
||||
- 若内容很长,用 Jina 清洗一遍节省 token
|
||||
6. 告知用户:「已通过你的登录态抓取,原链接:[xhs](url)」
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Installation Note
|
||||
|
||||
CDP features require Python + Playwright installed:
|
||||
|
||||
```bash
|
||||
pip3 install playwright beautifulsoup4
|
||||
python3 -m playwright install chromium # only needed if user hasn't installed Chrome
|
||||
```
|
||||
|
||||
If `playwright` is not installed when the user requests a login-gated site, run the install commands in Bash and explain you're setting up the browser automation dependency.
|
||||
330
skills/web-access/references/cdp-browser.md
Normal file
330
skills/web-access/references/cdp-browser.md
Normal file
@@ -0,0 +1,330 @@
|
||||
# CDP Browser Access — Login-Gated Sites Manual
|
||||
|
||||
Detailed recipes for accessing sites that require the user's login session, via Chrome DevTools Protocol (CDP) + Python Playwright.
|
||||
|
||||
**Precondition**: Chrome is already running with `--remote-debugging-port=9222` and the user has manually logged in to the target sites. See the main SKILL.md `Prerequisites` section for the launch command.
|
||||
|
||||
---
|
||||
|
||||
## Why CDP attach, not headless
|
||||
|
||||
| Approach | Login state | Anti-bot | Speed | Cost |
|
||||
|----------|-------------|----------|-------|------|
|
||||
| Headless Playwright (new context) | ❌ Empty cookies | ❌ Flagged as bot | Slow cold start | Re-login pain |
|
||||
| `playwright.chromium.launch(headless=False)` | ❌ Fresh profile | ⚠ Sometimes flagged | Slow | Same |
|
||||
| **CDP attach (`connect_over_cdp`)** | ✅ User's real cookies | ✅ Looks human | Instant | Zero friction |
|
||||
|
||||
**Rule**: For any login-gated site, always attach to the user's running Chrome.
|
||||
|
||||
---
|
||||
|
||||
## Core Template
|
||||
|
||||
Every CDP script follows this shape:
|
||||
|
||||
```python
|
||||
from playwright.sync_api import sync_playwright
|
||||
|
||||
def fetch_with_cdp(url: str, wait_selector: str | None = None) -> str:
|
||||
"""Attach to user's Chrome via CDP, fetch URL, return HTML."""
|
||||
with sync_playwright() as p:
|
||||
browser = p.chromium.connect_over_cdp("http://localhost:9222")
|
||||
# browser.contexts[0] is the user's default context (with cookies)
|
||||
context = browser.contexts[0]
|
||||
page = context.new_page()
|
||||
try:
|
||||
page.goto(url, wait_until="domcontentloaded", timeout=30000)
|
||||
if wait_selector:
|
||||
page.wait_for_selector(wait_selector, timeout=10000)
|
||||
else:
|
||||
page.wait_for_timeout(2000) # generic settle
|
||||
return page.content()
|
||||
finally:
|
||||
page.close()
|
||||
# DO NOT call browser.close() — that would close the user's Chrome!
|
||||
|
||||
if __name__ == "__main__":
|
||||
html = fetch_with_cdp("https://example.com")
|
||||
print(html[:1000])
|
||||
```
|
||||
|
||||
**Critical**: Never call `browser.close()` when using CDP attach — you'd kill the user's Chrome. Only close the page you opened.
|
||||
|
||||
---
|
||||
|
||||
## Site Recipes
|
||||
|
||||
### 小红书 (xiaohongshu.com)
|
||||
|
||||
```python
|
||||
from playwright.sync_api import sync_playwright
|
||||
from bs4 import BeautifulSoup
|
||||
|
||||
NOTE_URL = "https://www.xiaohongshu.com/explore/XXXXXXXX"
|
||||
|
||||
with sync_playwright() as p:
|
||||
browser = p.chromium.connect_over_cdp("http://localhost:9222")
|
||||
page = browser.contexts[0].new_page()
|
||||
page.goto(NOTE_URL, wait_until="domcontentloaded")
|
||||
page.wait_for_selector("#detail-title", timeout=10000)
|
||||
page.wait_for_timeout(1500) # let images/comments load
|
||||
html = page.content()
|
||||
page.close()
|
||||
|
||||
soup = BeautifulSoup(html, "html.parser")
|
||||
title = (soup.select_one("#detail-title") or {}).get_text(strip=True) if soup.select_one("#detail-title") else None
|
||||
desc = (soup.select_one("#detail-desc") or {}).get_text(" ", strip=True) if soup.select_one("#detail-desc") else None
|
||||
author = soup.select_one(".author-wrapper .username")
|
||||
print("Title:", title)
|
||||
print("Author:", author.get_text(strip=True) if author else None)
|
||||
print("Desc:", desc)
|
||||
```
|
||||
|
||||
**Selectors** (may drift over time — update if they fail):
|
||||
- Title: `#detail-title`
|
||||
- Description: `#detail-desc`
|
||||
- Author: `.author-wrapper .username`
|
||||
- Images: `.swiper-slide img`
|
||||
- Comments: `.parent-comment .content`
|
||||
|
||||
### B站 (bilibili.com)
|
||||
|
||||
```python
|
||||
from playwright.sync_api import sync_playwright
|
||||
from bs4 import BeautifulSoup
|
||||
|
||||
VIDEO_URL = "https://www.bilibili.com/video/BVxxxxxxxxx"
|
||||
|
||||
with sync_playwright() as p:
|
||||
browser = p.chromium.connect_over_cdp("http://localhost:9222")
|
||||
page = browser.contexts[0].new_page()
|
||||
page.goto(VIDEO_URL, wait_until="networkidle")
|
||||
page.wait_for_timeout(2000)
|
||||
html = page.content()
|
||||
page.close()
|
||||
|
||||
soup = BeautifulSoup(html, "html.parser")
|
||||
print("Title:", soup.select_one("h1.video-title").get_text(strip=True) if soup.select_one("h1.video-title") else None)
|
||||
print("UP:", soup.select_one(".up-name").get_text(strip=True) if soup.select_one(".up-name") else None)
|
||||
print("Desc:", soup.select_one(".desc-info-text").get_text(" ", strip=True) if soup.select_one(".desc-info-text") else None)
|
||||
```
|
||||
|
||||
**Tip**: For B站 evaluations, the [公开 API](https://api.bilibili.com/x/web-interface/view?bvid=XXXX) often returns JSON without needing CDP. Try it first:
|
||||
|
||||
```bash
|
||||
curl -s "https://api.bilibili.com/x/web-interface/view?bvid=BVxxxxxxxxx" | python3 -m json.tool
|
||||
```
|
||||
|
||||
### 微博 (weibo.com)
|
||||
|
||||
```python
|
||||
WEIBO_URL = "https://weibo.com/u/1234567890" # or /detail/xxx
|
||||
|
||||
# Same CDP template
|
||||
# Selectors:
|
||||
# .Feed_body_3R0rO .detail_wbtext_4CRf9 — post text
|
||||
# .ALink_default_2ibt1 — user link
|
||||
# article[aria-label="微博"] — each feed item
|
||||
```
|
||||
|
||||
**Note**: Weibo uses React + heavy obfuscation. Selectors change frequently. If selectors fail, pipe the HTML through Jina for clean Markdown:
|
||||
|
||||
```python
|
||||
html = fetch_with_cdp(WEIBO_URL)
|
||||
# Save to temp file, then:
|
||||
import subprocess
|
||||
result = subprocess.run(
|
||||
["curl", "-sL", f"https://r.jina.ai/{WEIBO_URL}"],
|
||||
capture_output=True, text=True
|
||||
)
|
||||
print(result.stdout)
|
||||
```
|
||||
|
||||
### 知乎 (zhihu.com)
|
||||
|
||||
```python
|
||||
ANSWER_URL = "https://www.zhihu.com/question/123/answer/456"
|
||||
|
||||
# Selectors:
|
||||
# h1.QuestionHeader-title — question title
|
||||
# .RichContent-inner — answer body
|
||||
# .AuthorInfo-name — author
|
||||
```
|
||||
|
||||
Zhihu works with CDP but often also renders enough metadata server-side for Jina to work:
|
||||
|
||||
```bash
|
||||
curl -sL "https://r.jina.ai/https://www.zhihu.com/question/123/answer/456"
|
||||
```
|
||||
|
||||
Try Jina first, fall back to CDP if content is truncated.
|
||||
|
||||
### 飞书文档 (feishu.cn / larksuite.com)
|
||||
|
||||
```python
|
||||
DOC_URL = "https://xxx.feishu.cn/docs/xxx"
|
||||
|
||||
# Feishu uses heavy virtualization — must scroll to load all content.
|
||||
# Recipe:
|
||||
|
||||
from playwright.sync_api import sync_playwright
|
||||
|
||||
with sync_playwright() as p:
|
||||
browser = p.chromium.connect_over_cdp("http://localhost:9222")
|
||||
page = browser.contexts[0].new_page()
|
||||
page.goto(DOC_URL, wait_until="domcontentloaded")
|
||||
page.wait_for_selector(".docs-render-unit", timeout=15000)
|
||||
|
||||
# Scroll to bottom repeatedly to load lazy content
|
||||
last_height = 0
|
||||
for _ in range(20):
|
||||
page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
|
||||
page.wait_for_timeout(800)
|
||||
h = page.evaluate("document.body.scrollHeight")
|
||||
if h == last_height:
|
||||
break
|
||||
last_height = h
|
||||
|
||||
# Extract text
|
||||
text = page.evaluate("() => document.body.innerText")
|
||||
page.close()
|
||||
|
||||
print(text)
|
||||
```
|
||||
|
||||
### Twitter / X
|
||||
|
||||
```python
|
||||
TWEET_URL = "https://x.com/username/status/1234567890"
|
||||
|
||||
# Selectors:
|
||||
# article[data-testid="tweet"] — tweet container
|
||||
# div[data-testid="tweetText"] — tweet text
|
||||
# div[data-testid="User-Name"] — author
|
||||
# a[href$="/analytics"] — view count anchor (next sibling has stats)
|
||||
```
|
||||
|
||||
Twitter is aggressive with anti-bot. CDP attach usually works, but set a generous wait:
|
||||
|
||||
```python
|
||||
page.goto(url, wait_until="networkidle", timeout=45000)
|
||||
page.wait_for_selector('article[data-testid="tweet"]', timeout=15000)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Patterns
|
||||
|
||||
### Pattern 1: Scroll to load lazy content
|
||||
|
||||
```python
|
||||
def scroll_to_bottom(page, max_steps=30, pause_ms=800):
|
||||
last = 0
|
||||
for _ in range(max_steps):
|
||||
page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
|
||||
page.wait_for_timeout(pause_ms)
|
||||
h = page.evaluate("document.body.scrollHeight")
|
||||
if h == last:
|
||||
return
|
||||
last = h
|
||||
```
|
||||
|
||||
### Pattern 2: Screenshot a specific element
|
||||
|
||||
```python
|
||||
element = page.locator("article").first
|
||||
element.screenshot(path="/tmp/article.png")
|
||||
```
|
||||
|
||||
### Pattern 3: Extract structured data via JavaScript
|
||||
|
||||
```python
|
||||
data = page.evaluate("""() => {
|
||||
const items = document.querySelectorAll('.list-item');
|
||||
return Array.from(items).map(el => ({
|
||||
title: el.querySelector('.title')?.innerText,
|
||||
url: el.querySelector('a')?.href,
|
||||
}));
|
||||
}""")
|
||||
print(data)
|
||||
```
|
||||
|
||||
### Pattern 4: Fill a form and click
|
||||
|
||||
```python
|
||||
page.fill("input[name=q]", "search query")
|
||||
page.click("button[type=submit]")
|
||||
page.wait_for_load_state("networkidle")
|
||||
```
|
||||
|
||||
### Pattern 5: Clean HTML via Jina after extraction
|
||||
|
||||
When selectors are unreliable, dump the full page HTML and let Jina do the cleaning:
|
||||
|
||||
```python
|
||||
html = page.content()
|
||||
# Save to file, serve via local HTTP, or just pipe the original URL:
|
||||
import subprocess
|
||||
clean_md = subprocess.run(
|
||||
["curl", "-sL", f"https://r.jina.ai/{url}"],
|
||||
capture_output=True, text=True
|
||||
).stdout
|
||||
print(clean_md)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### `connect_over_cdp` fails with `ECONNREFUSED`
|
||||
|
||||
Chrome is not running with remote debugging. Tell the user:
|
||||
> "请先用下面的命令启动 Chrome:
|
||||
> `/Applications/Google\\ Chrome.app/Contents/MacOS/Google\\ Chrome --remote-debugging-port=9222 --user-data-dir=\"$HOME/.desirecore/chrome-profile\"`
|
||||
> 然后手动登录需要抓取的网站,再让我继续。"
|
||||
|
||||
### `browser.contexts[0]` is empty
|
||||
|
||||
Chrome was launched but no windows are open. Ask the user to open at least one tab and navigate anywhere.
|
||||
|
||||
### Playwright not installed
|
||||
|
||||
```bash
|
||||
pip3 install playwright beautifulsoup4
|
||||
# No need for `playwright install` — we're attaching to existing Chrome, not downloading a new browser
|
||||
```
|
||||
|
||||
### Site detects automation
|
||||
|
||||
Despite CDP attach, some sites (Cloudflare-protected, Instagram) may still detect automation. Options:
|
||||
1. Use Jina Reader instead (`curl -sL https://r.jina.ai/<url>`) — often succeeds where Playwright fails
|
||||
2. Ask the user to manually copy the visible content
|
||||
3. Use the site's public API if available
|
||||
|
||||
### Content is truncated
|
||||
|
||||
The page uses virtualization or lazy loading. Apply Pattern 1 (scroll to bottom) before calling `page.content()`.
|
||||
|
||||
### `page.wait_for_selector` times out
|
||||
|
||||
The selector is stale — the site updated its DOM. Dump `page.content()[:5000]` and inspect manually, or fall back to Jina Reader.
|
||||
|
||||
---
|
||||
|
||||
## Security Notes
|
||||
|
||||
- **Never log or print cookies** from `context.cookies()` even during debugging
|
||||
- **Never extract and store** the user's session tokens to files
|
||||
- **Never use the CDP session** to perform writes (post, comment, like) unless the user explicitly requested it
|
||||
- The `~/.desirecore/chrome-profile` directory contains the user's credentials — treat it as sensitive
|
||||
- If the user asks to "log in automatically", refuse and explain they must log in manually in the Chrome window; the skill only reads already-authenticated sessions
|
||||
|
||||
---
|
||||
|
||||
## When NOT to use CDP
|
||||
|
||||
- **Public static sites** → use L1 `WebFetch`, it's faster
|
||||
- **Heavy SPAs without login walls** → use L2 Jina Reader, it's cheaper on tokens
|
||||
- **You need thousands of pages** → CDP is not built for scale; look into proper scrapers
|
||||
|
||||
CDP is specifically the "right tool" for: **small number of pages + login required + human-like behavior needed**.
|
||||
122
skills/web-access/references/jina-reader.md
Normal file
122
skills/web-access/references/jina-reader.md
Normal file
@@ -0,0 +1,122 @@
|
||||
# Jina Reader — Default Token-Optimization Layer
|
||||
|
||||
[Jina Reader](https://jina.ai/reader) is a free public service that renders any URL server-side and returns clean Markdown. In this skill's three-layer architecture, **Jina is Layer 2: the default extractor for heavy/JS-rendered pages**, not just a fallback.
|
||||
|
||||
---
|
||||
|
||||
## Positioning in the three-layer model
|
||||
|
||||
```
|
||||
L1 WebFetch ── simple public static pages (docs, Wikipedia, HN)
|
||||
│
|
||||
│ WebFetch empty/truncated/garbled
|
||||
▼
|
||||
L2 Jina Reader ── DEFAULT for JS-heavy SPAs, long articles, Medium, Twitter
|
||||
│ Strips nav/ads automatically, saves 50-80% tokens
|
||||
│
|
||||
│ Login required, or Jina also fails
|
||||
▼
|
||||
L3 CDP Browser ── user's logged-in Chrome (小红书/B站/微博/飞书/Twitter)
|
||||
```
|
||||
|
||||
**Key insight**: Don't wait for WebFetch to fail before trying Jina. For any URL you expect to be JS-heavy (any major SPA, Medium, Dev.to, long-form articles), go straight to Jina for the token savings.
|
||||
|
||||
---
|
||||
|
||||
## Basic Usage (no API key)
|
||||
|
||||
```bash
|
||||
curl -sL "https://r.jina.ai/https://example.com/article"
|
||||
```
|
||||
|
||||
The original URL goes after `r.jina.ai/`. The response is plain Markdown — pipe to a file or read directly.
|
||||
|
||||
---
|
||||
|
||||
## When to use each layer
|
||||
|
||||
| Scenario | Primary choice | Why |
|
||||
|----------|---------------|-----|
|
||||
| Wikipedia, MDN, official docs | L1 WebFetch | Static clean HTML, fastest |
|
||||
| GitHub README (public) | L1 WebFetch | Simple markup |
|
||||
| Medium articles | **L2 Jina** | Member walls + heavy JS |
|
||||
| Dev.to, Hashnode | **L2 Jina** | JS-rendered |
|
||||
| Substack, Ghost blogs | **L2 Jina** | Partial JS rendering |
|
||||
| News sites with lazy-load | **L2 Jina** | Scroll-triggered content |
|
||||
| Twitter/X public threads | **L2 Jina** first, L3 CDP if truncated | Sometimes works |
|
||||
| 公众号 (mp.weixin.qq.com) | **L2 Jina** | Clean Markdown extraction |
|
||||
| LinkedIn articles | L3 CDP | Hard login wall |
|
||||
| 小红书, B站, 微博, 飞书 | L3 CDP | 登录强制 |
|
||||
|
||||
---
|
||||
|
||||
## Token savings example
|
||||
|
||||
Raw HTML of a long Medium article: ~150 KB, ~50,000 tokens
|
||||
Same article via Jina Reader: ~20 KB, ~7,000 tokens
|
||||
|
||||
**86% reduction**, with cleaner structure and no ads/nav cruft.
|
||||
|
||||
---
|
||||
|
||||
## Advanced Endpoints (optional)
|
||||
|
||||
If you need more than basic content extraction, Jina also offers:
|
||||
|
||||
- **Search**: `https://s.jina.ai/<query>` — returns top 5 results as Markdown
|
||||
- **Embeddings**: `https://api.jina.ai/v1/embeddings` (requires free API key)
|
||||
- **Reranker**: `https://api.jina.ai/v1/rerank` (requires free API key)
|
||||
|
||||
For DesireCore, prefer the built-in `WebSearch` tool over `s.jina.ai` for consistency.
|
||||
|
||||
---
|
||||
|
||||
## Rate Limits
|
||||
|
||||
- **Free tier**: ~20 requests/minute, no authentication needed
|
||||
- **With free API key**: higher limits, fewer throttles
|
||||
```bash
|
||||
curl -sL "https://r.jina.ai/https://example.com" \
|
||||
-H "Authorization: Bearer YOUR_KEY"
|
||||
```
|
||||
- Get a free key at [jina.ai](https://jina.ai) — stored in env var `JINA_API_KEY` if available
|
||||
|
||||
---
|
||||
|
||||
## Usage tips
|
||||
|
||||
### Cache your own results
|
||||
Jina itself doesn't cache for you. If you call the same URL repeatedly in a session, save the Markdown to a temp file:
|
||||
|
||||
```bash
|
||||
curl -sL "https://r.jina.ai/$URL" > /tmp/jina-cache.md
|
||||
```
|
||||
|
||||
### Handle very long articles
|
||||
Jina returns the full article in one response. For articles > 50K chars, pipe through `head` or extract specific sections with Python/awk before feeding back to the model context.
|
||||
|
||||
### Combine with CDP
|
||||
When you use L3 CDP to fetch a login-gated page, you can pipe the resulting HTML through Jina for clean Markdown instead of parsing with BeautifulSoup:
|
||||
|
||||
```python
|
||||
html = fetch_with_cdp(url) # from references/cdp-browser.md
|
||||
# Now convert via Jina (note: Jina fetches the URL itself, not your HTML)
|
||||
# So this only works if the content is already visible without login:
|
||||
import subprocess
|
||||
md = subprocess.run(["curl", "-sL", f"https://r.jina.ai/{url}"],
|
||||
capture_output=True, text=True).stdout
|
||||
```
|
||||
|
||||
For truly login-gated content, you must parse the HTML directly (BeautifulSoup) since Jina can't log in on your behalf.
|
||||
|
||||
---
|
||||
|
||||
## Failure Mode
|
||||
|
||||
If Jina Reader returns garbage or error:
|
||||
1. **Hard login wall** → escalate to L3 CDP browser
|
||||
2. **Geographically restricted** → tell the user, suggest VPN or manual access
|
||||
3. **Cloudflare challenge** → try L3 CDP (user's browser passes challenges naturally)
|
||||
4. **404 / gone** → confirm the URL is correct
|
||||
|
||||
In all cases, tell the user explicitly which URL failed and what you tried.
|
||||
136
skills/web-access/references/workflows.md
Normal file
136
skills/web-access/references/workflows.md
Normal file
@@ -0,0 +1,136 @@
|
||||
# Common Research Workflows
|
||||
|
||||
Reusable templates for multi-step research tasks. Adapt the queries and URLs to the specific topic.
|
||||
|
||||
---
|
||||
|
||||
## 1. Technical Documentation Lookup
|
||||
|
||||
**Goal**: Find the authoritative answer to a "how do I X with library Y" question.
|
||||
|
||||
```
|
||||
Step 1: WebSearch("<library> <feature> documentation site:<official-domain>")
|
||||
↓ if no results, drop the site: filter
|
||||
Step 2: WebFetch the top 1-2 official doc pages
|
||||
Step 3: If example code is incomplete, also fetch the GitHub README or examples folder:
|
||||
Bash: gh api repos/<owner>/<repo>/contents/examples
|
||||
Step 4: Synthesize a concise answer with one runnable code block
|
||||
```
|
||||
|
||||
**Tip**: Always check the doc version matches the user's installed version. Look for version selectors in the page.
|
||||
|
||||
---
|
||||
|
||||
## 2. Competitor / Product Comparison
|
||||
|
||||
**Goal**: Build a structured comparison of 2-N similar products.
|
||||
|
||||
```
|
||||
Step 1: WebSearch("<product-A> vs <product-B> comparison <year>")
|
||||
Step 2: WebSearch("<product-A> features pricing") ─┐ parallel
|
||||
Step 3: WebSearch("<product-B> features pricing") ─┘
|
||||
Step 4: WebFetch official pricing/features pages for each (parallel)
|
||||
Step 5: WebFetch 1 third-party comparison article (parallel)
|
||||
Step 6: Build markdown table with consistent dimensions:
|
||||
| Dimension | Product A | Product B |
|
||||
|-----------|-----------|-----------|
|
||||
| Pricing | ... | ... |
|
||||
| Features | ... | ... |
|
||||
| License | ... | ... |
|
||||
Step 7: Add a "Recommendation" paragraph based on user's stated needs
|
||||
```
|
||||
|
||||
**Tip**: When dimensions differ between sources, prefer the official source over third-party.
|
||||
|
||||
---
|
||||
|
||||
## 3. News Aggregation & Timeline
|
||||
|
||||
**Goal**: Build a chronological summary of recent events on a topic.
|
||||
|
||||
```
|
||||
Step 1: WebSearch("<topic> news <year>", max_results=10)
|
||||
Step 2: Skim snippets, group by date
|
||||
Step 3: WebFetch the 3-5 most substantive articles (parallel)
|
||||
Step 4: Build timeline:
|
||||
## YYYY-MM-DD - Event headline
|
||||
- Key fact 1 [source](url)
|
||||
- Key fact 2 [source](url)
|
||||
Step 5: End with a "Current State" paragraph
|
||||
```
|
||||
|
||||
**Tip**: Use `allowed_domains` to constrain to authoritative news sources if needed.
|
||||
|
||||
---
|
||||
|
||||
## 4. Library Version Investigation
|
||||
|
||||
**Goal**: Find the latest version, breaking changes, and migration notes.
|
||||
|
||||
```
|
||||
Step 1: Get latest version via API (faster than scraping):
|
||||
Python: curl https://pypi.org/pypi/<package>/json | jq .info.version
|
||||
Node: curl https://registry.npmjs.org/<package>/latest | jq .version
|
||||
Rust: curl https://crates.io/api/v1/crates/<crate> | jq .crate.max_version
|
||||
|
||||
Step 2: Get changelog:
|
||||
gh api repos/<owner>/<repo>/releases/latest
|
||||
|
||||
Step 3: If migration is needed, search:
|
||||
WebSearch("<package> migration guide v<old> to v<new>")
|
||||
WebFetch the official migration doc
|
||||
|
||||
Step 4: Summarize: latest version, breaking changes (bullet list), 1-2 code diffs
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. API Endpoint Discovery
|
||||
|
||||
**Goal**: Find a specific API endpoint and its parameters.
|
||||
|
||||
```
|
||||
Step 1: WebSearch("<service> API <action> reference")
|
||||
Step 2: WebFetch official API reference page
|
||||
Step 3: If response includes "Try it" / "Sandbox" link, mention it
|
||||
Step 4: Extract:
|
||||
- Endpoint URL
|
||||
- HTTP method
|
||||
- Required headers (auth)
|
||||
- Request body schema
|
||||
- Response schema
|
||||
- Example curl command
|
||||
Step 5: Format as a self-contained code block the user can copy-paste
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Quick Fact Check
|
||||
|
||||
**Goal**: Verify a single specific claim.
|
||||
|
||||
```
|
||||
Step 1: WebSearch("<exact claim phrase>")
|
||||
Step 2: If 2+ authoritative sources agree → confirmed
|
||||
Step 3: If sources disagree → report both sides + which is more authoritative
|
||||
Step 4: If no sources found → say "could not verify" — do NOT guess
|
||||
```
|
||||
|
||||
**Tip**: For numeric facts, find the primary source (official report, paper) rather than secondary citations.
|
||||
|
||||
---
|
||||
|
||||
## Parallelization Cheat Sheet
|
||||
|
||||
When you need multiple independent fetches, **always batch them in a single message with multiple tool calls** rather than sequentially. Examples:
|
||||
|
||||
```
|
||||
✅ Single message with:
|
||||
- WebFetch(url1)
|
||||
- WebFetch(url2)
|
||||
- WebFetch(url3)
|
||||
|
||||
❌ Three separate messages, each with one WebFetch
|
||||
```
|
||||
|
||||
This applies equally to WebSearch with different queries, and to mixed Search+Fetch when you have URLs from previous searches.
|
||||
Reference in New Issue
Block a user