根据 docs 推荐补齐 5 个内置技能中的 c) 和 e): web-access v1.1.0: - 三层架构:L1 WebSearch/WebFetch + L2 Jina Reader + L3 CDP Browser - 添加 Chrome CDP 前置条件(macOS/Linux/Windows 启动命令) - 支持登录态访问 小红书/B站/微博/知乎/飞书/Twitter/公众号 - Jina Reader 重新定位为默认 token 优化层(非兜底) - 新增 references/cdp-browser.md(Python Playwright 详细操作手册) - 触发词扩充:小红书、B站、微博、飞书、Twitter、推特、X、知乎、公众号 frontend-design v1.0.0: - 从 Claude Code 官方 frontend-design 技能适配 - 保留原版 bold aesthetic 设计理念 - 新增 Project Context Override 章节:在 DesireCore 主仓库内工作时 自动遵循 3+2 色彩体系(Green/Blue/Purple + Orange/Red) - 添加 Output Rule 要求告知用户文件路径 builtin-skills.json: 12 → 14 skills
4.4 KiB
Jina Reader — Default Token-Optimization Layer
Jina Reader is a free public service that renders any URL server-side and returns clean Markdown. In this skill's three-layer architecture, Jina is Layer 2: the default extractor for heavy/JS-rendered pages, not just a fallback.
Positioning in the three-layer model
L1 WebFetch ── simple public static pages (docs, Wikipedia, HN)
│
│ WebFetch empty/truncated/garbled
▼
L2 Jina Reader ── DEFAULT for JS-heavy SPAs, long articles, Medium, Twitter
│ Strips nav/ads automatically, saves 50-80% tokens
│
│ Login required, or Jina also fails
▼
L3 CDP Browser ── user's logged-in Chrome (小红书/B站/微博/飞书/Twitter)
Key insight: Don't wait for WebFetch to fail before trying Jina. For any URL you expect to be JS-heavy (any major SPA, Medium, Dev.to, long-form articles), go straight to Jina for the token savings.
Basic Usage (no API key)
curl -sL "https://r.jina.ai/https://example.com/article"
The original URL goes after r.jina.ai/. The response is plain Markdown — pipe to a file or read directly.
When to use each layer
| Scenario | Primary choice | Why |
|---|---|---|
| Wikipedia, MDN, official docs | L1 WebFetch | Static clean HTML, fastest |
| GitHub README (public) | L1 WebFetch | Simple markup |
| Medium articles | L2 Jina | Member walls + heavy JS |
| Dev.to, Hashnode | L2 Jina | JS-rendered |
| Substack, Ghost blogs | L2 Jina | Partial JS rendering |
| News sites with lazy-load | L2 Jina | Scroll-triggered content |
| Twitter/X public threads | L2 Jina first, L3 CDP if truncated | Sometimes works |
| 公众号 (mp.weixin.qq.com) | L2 Jina | Clean Markdown extraction |
| LinkedIn articles | L3 CDP | Hard login wall |
| 小红书, B站, 微博, 飞书 | L3 CDP | 登录强制 |
Token savings example
Raw HTML of a long Medium article: ~150 KB, ~50,000 tokens Same article via Jina Reader: ~20 KB, ~7,000 tokens
86% reduction, with cleaner structure and no ads/nav cruft.
Advanced Endpoints (optional)
If you need more than basic content extraction, Jina also offers:
- Search:
https://s.jina.ai/<query>— returns top 5 results as Markdown - Embeddings:
https://api.jina.ai/v1/embeddings(requires free API key) - Reranker:
https://api.jina.ai/v1/rerank(requires free API key)
For DesireCore, prefer the built-in WebSearch tool over s.jina.ai for consistency.
Rate Limits
- Free tier: ~20 requests/minute, no authentication needed
- With free API key: higher limits, fewer throttles
curl -sL "https://r.jina.ai/https://example.com" \ -H "Authorization: Bearer YOUR_KEY" - Get a free key at jina.ai — stored in env var
JINA_API_KEYif available
Usage tips
Cache your own results
Jina itself doesn't cache for you. If you call the same URL repeatedly in a session, save the Markdown to a temp file:
curl -sL "https://r.jina.ai/$URL" > /tmp/jina-cache.md
Handle very long articles
Jina returns the full article in one response. For articles > 50K chars, pipe through head or extract specific sections with Python/awk before feeding back to the model context.
Combine with CDP
When you use L3 CDP to fetch a login-gated page, you can pipe the resulting HTML through Jina for clean Markdown instead of parsing with BeautifulSoup:
html = fetch_with_cdp(url) # from references/cdp-browser.md
# Now convert via Jina (note: Jina fetches the URL itself, not your HTML)
# So this only works if the content is already visible without login:
import subprocess
md = subprocess.run(["curl", "-sL", f"https://r.jina.ai/{url}"],
capture_output=True, text=True).stdout
For truly login-gated content, you must parse the HTML directly (BeautifulSoup) since Jina can't log in on your behalf.
Failure Mode
If Jina Reader returns garbage or error:
- Hard login wall → escalate to L3 CDP browser
- Geographically restricted → tell the user, suggest VPN or manual access
- Cloudflare challenge → try L3 CDP (user's browser passes challenges naturally)
- 404 / gone → confirm the URL is correct
In all cases, tell the user explicitly which URL failed and what you tried.