Files
market/skills/web-access/references/jina-reader.md
张馨元 98322aa930 feat: 新增 web-access 和 frontend-design 两个内置技能
根据 docs 推荐补齐 5 个内置技能中的 c) 和 e):

web-access v1.1.0:
- 三层架构:L1 WebSearch/WebFetch + L2 Jina Reader + L3 CDP Browser
- 添加 Chrome CDP 前置条件(macOS/Linux/Windows 启动命令)
- 支持登录态访问 小红书/B站/微博/知乎/飞书/Twitter/公众号
- Jina Reader 重新定位为默认 token 优化层(非兜底)
- 新增 references/cdp-browser.md(Python Playwright 详细操作手册)
- 触发词扩充:小红书、B站、微博、飞书、Twitter、推特、X、知乎、公众号

frontend-design v1.0.0:
- 从 Claude Code 官方 frontend-design 技能适配
- 保留原版 bold aesthetic 设计理念
- 新增 Project Context Override 章节:在 DesireCore 主仓库内工作时
  自动遵循 3+2 色彩体系(Green/Blue/Purple + Orange/Red)
- 添加 Output Rule 要求告知用户文件路径

builtin-skills.json: 12 → 14 skills
2026-04-07 15:35:59 +08:00

4.4 KiB

Jina Reader — Default Token-Optimization Layer

Jina Reader is a free public service that renders any URL server-side and returns clean Markdown. In this skill's three-layer architecture, Jina is Layer 2: the default extractor for heavy/JS-rendered pages, not just a fallback.


Positioning in the three-layer model

L1 WebFetch            ── simple public static pages (docs, Wikipedia, HN)
    │
    │ WebFetch empty/truncated/garbled
    ▼
L2 Jina Reader         ── DEFAULT for JS-heavy SPAs, long articles, Medium, Twitter
    │                     Strips nav/ads automatically, saves 50-80% tokens
    │
    │ Login required, or Jina also fails
    ▼
L3 CDP Browser         ── user's logged-in Chrome (小红书/B站/微博/飞书/Twitter)

Key insight: Don't wait for WebFetch to fail before trying Jina. For any URL you expect to be JS-heavy (any major SPA, Medium, Dev.to, long-form articles), go straight to Jina for the token savings.


Basic Usage (no API key)

curl -sL "https://r.jina.ai/https://example.com/article"

The original URL goes after r.jina.ai/. The response is plain Markdown — pipe to a file or read directly.


When to use each layer

Scenario Primary choice Why
Wikipedia, MDN, official docs L1 WebFetch Static clean HTML, fastest
GitHub README (public) L1 WebFetch Simple markup
Medium articles L2 Jina Member walls + heavy JS
Dev.to, Hashnode L2 Jina JS-rendered
Substack, Ghost blogs L2 Jina Partial JS rendering
News sites with lazy-load L2 Jina Scroll-triggered content
Twitter/X public threads L2 Jina first, L3 CDP if truncated Sometimes works
公众号 (mp.weixin.qq.com) L2 Jina Clean Markdown extraction
LinkedIn articles L3 CDP Hard login wall
小红书, B站, 微博, 飞书 L3 CDP 登录强制

Token savings example

Raw HTML of a long Medium article: ~150 KB, ~50,000 tokens Same article via Jina Reader: ~20 KB, ~7,000 tokens

86% reduction, with cleaner structure and no ads/nav cruft.


Advanced Endpoints (optional)

If you need more than basic content extraction, Jina also offers:

  • Search: https://s.jina.ai/<query> — returns top 5 results as Markdown
  • Embeddings: https://api.jina.ai/v1/embeddings (requires free API key)
  • Reranker: https://api.jina.ai/v1/rerank (requires free API key)

For DesireCore, prefer the built-in WebSearch tool over s.jina.ai for consistency.


Rate Limits

  • Free tier: ~20 requests/minute, no authentication needed
  • With free API key: higher limits, fewer throttles
    curl -sL "https://r.jina.ai/https://example.com" \
         -H "Authorization: Bearer YOUR_KEY"
    
  • Get a free key at jina.ai — stored in env var JINA_API_KEY if available

Usage tips

Cache your own results

Jina itself doesn't cache for you. If you call the same URL repeatedly in a session, save the Markdown to a temp file:

curl -sL "https://r.jina.ai/$URL" > /tmp/jina-cache.md

Handle very long articles

Jina returns the full article in one response. For articles > 50K chars, pipe through head or extract specific sections with Python/awk before feeding back to the model context.

Combine with CDP

When you use L3 CDP to fetch a login-gated page, you can pipe the resulting HTML through Jina for clean Markdown instead of parsing with BeautifulSoup:

html = fetch_with_cdp(url)  # from references/cdp-browser.md
# Now convert via Jina (note: Jina fetches the URL itself, not your HTML)
# So this only works if the content is already visible without login:
import subprocess
md = subprocess.run(["curl", "-sL", f"https://r.jina.ai/{url}"],
                    capture_output=True, text=True).stdout

For truly login-gated content, you must parse the HTML directly (BeautifulSoup) since Jina can't log in on your behalf.


Failure Mode

If Jina Reader returns garbage or error:

  1. Hard login wall → escalate to L3 CDP browser
  2. Geographically restricted → tell the user, suggest VPN or manual access
  3. Cloudflare challenge → try L3 CDP (user's browser passes challenges naturally)
  4. 404 / gone → confirm the URL is correct

In all cases, tell the user explicitly which URL failed and what you tried.