feat: skills i18n 改造(schemaVersion 1.1,零向后兼容) (#1)

* feat: skills i18n 改造 — schemaVersion 1.1,零向后兼容

把 21 个 skills + 1 个 agent + manifest/categories 全量迁移到 schemaVersion 1.1
的 i18n 结构,配套 CI AI 翻译流水线(GitHub Models)与本地工具链。

## 关键变更

### 数据结构(破坏性,schemaVersion 1.0 → 1.1)
- SKILL.md: 顶层 name 改为 ASCII slug(== 目录名,符合 agentskills.io 规范);
  中文显示名/short_desc/description 全部迁入 metadata.i18n.<locale>
- agents/<id>/agent.json: shortDesc/fullDesc/tags/persona.{role,traits} 迁入
  i18n.<locale>;changelog[].changes 改为 { <locale>: string[] } 对象
- categories.json: 每个分类的 label/description 迁入 i18n.<locale>,顶层只剩
  color/icon
- manifest.json: 加 supportedLocales / defaultLocale;顶层 description 迁入
  i18n.<locale>

### Body 文件结构
- 根 SKILL.md = frontmatter + default_locale (en-US) body
- SKILL.<locale>.md = 各 locale 的 markdown body(首行 <!-- locale: xx --> 自校验)

### 工具链(scripts/i18n/)
- glossary.json: zh→en 术语表 + do_not_translate 白名单
- schema/skill-frontmatter.schema.json: i18n frontmatter JSON Schema
- validate-i18n.py: 8 条校验规则(name 合规 / locale 完整性 / hash 一致性等)
- translate.py: GitHub Models / Anthropic 双 backend,sha256 增量翻译
- migrate.py: 一次性迁移脚本(旧格式 → i18n 结构)

### CI(.github/workflows/)
- i18n-validate.yml: PR 触发跑 validate + translate --check
- i18n-translate.yml: PR 触发用 GitHub Models(默认 openai/gpt-5-mini)翻译缺失
  locale,自动追加 commit;可切到 ANTHROPIC_API_KEY 走 Claude

### 文档
- docs/I18N.md: 作者贡献指南(schema 说明 / 提交流程 / 常见问题)
- README.md: 加多语言段落

## 验证

- uv run scripts/i18n/validate-i18n.py: OK,49 文件 0 错误
- uv run scripts/i18n/translate.py --check: 0 stale locale
- 21 skills 标题数 zh-CN == en-US 严格对齐(最大 66=66)
- skills-ref 规范校验:全部通过(顶层 name ASCII slug + description 单字段)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(i18n): 修复 PR #1 review 反馈的 6 项问题

- schema: translated_by 正则放宽为 ^(human|ai:[A-Za-z0-9._:/-]+)$,接受
  'ai:github:openai/gpt-5-mini' 这类 backend:model 形式(CI 翻译输出格式)
- README + docs/I18N.md: 修正"CI 用 Claude API"误导描述,正确说明默认是
  GitHub Models(openai/gpt-5-mini)+ GITHUB_TOKEN,可选切到 Anthropic
- skills/minimax-tts/SKILL.md & SKILL.zh-CN.md: 删除多余的 ``` 闭合,避免
  Markdown 后续渲染错乱
- skills/docx/SKILL.md: 翻译时丢失的 • Unicode escape 示例已恢复,
  与 zh-CN 版本对齐

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-05 00:26:33 +08:00
committed by GitHub
parent 1c107a9344
commit 1f7c8b9673
59 changed files with 10533 additions and 2014 deletions

View File

@@ -1,5 +1,5 @@
---
name: PDF 文档处理
name: pdf
description: >-
Use this skill whenever the user wants to do anything with PDF files. This
includes reading or extracting text/tables from PDFs, combining or merging
@@ -22,6 +22,29 @@ tags:
metadata:
author: anthropic
updated_at: '2026-04-13'
i18n:
default_locale: en-US
source_locale: zh-CN
locales:
- zh-CN
- en-US
zh-CN:
name: PDF 文档处理
short_desc: 读取、创建、合并、拆分和填写 PDF 文档
description: >-
Use this skill whenever the user wants to do anything with PDF files. This includes reading or extracting text/tables from PDFs, combining or merging multiple PDFs into one, splitting PDFs apart, rotating pages, adding watermarks, creating new PDFs, filling PDF forms, encrypting/decrypting PDFs, extracting images, and OCR on scanned PDFs to make them searchable. If the user mentions a .pdf file or asks to produce one, use this skill. Use when 用户提到 PDF、读取PDF、合并PDF、拆分PDF、填写表单、加水印、提取文字、 扫描识别。
body: ./SKILL.zh-CN.md
source_hash: sha256:15805c1921ac2c1e
translated_by: human
en-US:
name: PDF Document Processing
short_desc: Read, create, merge, split, and fill PDF documents
description: >-
Use this skill whenever the user wants to do anything with PDF files. This includes reading or extracting text/tables from PDFs, combining or merging multiple PDFs into one, splitting PDFs apart, rotating pages, adding watermarks, creating new PDFs, filling PDF forms, encrypting/decrypting PDFs, extracting images, and OCR on scanned PDFs to make them searchable. If the user mentions a .pdf file or asks to produce one, use this skill. Use when the user mentions PDF, reading PDFs, merging PDFs, splitting PDFs, filling forms, adding watermarks, extracting text, or OCR.
body: ./SKILL.md
source_hash: sha256:15805c1921ac2c1e
translated_by: ai:claude-opus-4-7
translated_at: '2026-05-03'
market:
icon: >-
<svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0
@@ -35,7 +58,6 @@ market:
stroke="url(#pd-a)" stroke-width="1.3" stroke-linecap="round"/><path
d="M17 11v6l2-1.5 2 1.5v-6z" fill="#FF3B30"
fill-opacity="0.8"/></svg>
short_desc: 读取、创建、合并、拆分和填写 PDF 文档
category: productivity
maintainer:
name: DesireCore Official
@@ -43,67 +65,67 @@ market:
channel: latest
---
# pdf 技能
# pdf skill
## L0:一句话摘要
## L0: One-Sentence Summary
读取、创建、合并、拆分和填写 PDF 文档,支持 OCR 识别和命令行工具。
Read, create, merge, split, and fill PDF documents, with OCR support and command-line tools.
## L1:概述与使用场景
## L1: Overview and Use Cases
### 能力描述
### Capability Description
pdf 是一个**流程型技能(Procedural Skill**,提供 PDF 文档的完整处理能力。基于 Python 库(pypdfpdfplumberreportlab)和命令行工具(qpdfpdftotextpdftk),支持文本提取、表格提取、合并拆分、旋转、水印、加密、表单填写和 OCR 识别。
pdf is a **Procedural Skill** that provides full PDF document processing capabilities. Built on Python libraries (pypdf, pdfplumber, reportlab) and command-line tools (qpdf, pdftotext, pdftk), it supports text extraction, table extraction, merging/splitting, rotation, watermarking, encryption, form filling, and OCR.
### 使用场景
### Use Cases
- 用户需要从 PDF 中提取文本或表格数据
- 用户需要合并多个 PDF 或拆分页面
- 用户需要创建新的 PDF 文档
- 用户需要填写 PDF 表单、添加水印或加密
- The user needs to extract text or table data from a PDF
- The user needs to merge multiple PDFs or split pages
- The user needs to create a new PDF document
- The user needs to fill PDF forms, add watermarks, or encrypt PDFs
## L2:详细规范
## L2: Detailed Specification
## Prerequisites
### Python 3(必需)
### Python 3 (required)
在执行任何 Python 操作之前,先检测 Python 是否可用:
Before performing any Python operation, check that Python is available:
```bash
python3 --version 2>/dev/null || python --version 2>/dev/null
```
如果命令失败Python 不可用),**必须停止并告知用户安装 Python 3**
If the command fails (Python is not available), **you must stop and tell the user to install Python 3**:
- **macOS**: `brew install python3` 或从 https://www.python.org/downloads/ 下载
- **Windows**: `winget install Python.Python.3` 或从 python.org 下载(安装时勾选 "Add Python to PATH"
- **macOS**: `brew install python3`, or download from https://www.python.org/downloads/
- **Windows**: `winget install Python.Python.3`, or download from python.org (check "Add Python to PATH" during installation)
- **Linux (Debian/Ubuntu)**: `sudo apt install python3 python3-pip`
- **Linux (Fedora/RHEL)**: `sudo dnf install python3 python3-pip`
如需更详细的环境配置帮助Python 相关问题加载 `python-runtime` 技能;
其他(系统工具如 poppler / tesseract、容器 / WSL加载 `dev-environment-setup` 技能。
For more detailed environment setup help: load the `python-runtime` skill for Python issues;
load the `dev-environment-setup` skill for everything else (system tools like poppler / tesseract, containers / WSL).
### Python 包依赖
### Python Package Dependencies
本技能依赖以下 Python 包(按需检测):
This skill depends on the following Python packages (checked on demand):
- `pypdf`PDF 基础操作(读取、合并、拆分、旋转)
- `pdfplumber`表格提取、带布局的文本提取
- `Pillow`图片处理(水印、验证图等)
- `reportlab` — PDF 创建(可选,按需安装)
- `pdf2image` — PDF 转图片(可选,需要 poppler
- `pypdf`Basic PDF operations (read, merge, split, rotate)
- `pdfplumber`Table extraction, layout-aware text extraction
- `Pillow`Image processing (watermarks, verification images, etc.)
- `reportlab` — PDF creation (optional, install on demand)
- `pdf2image` — PDF-to-image conversion (optional, requires poppler)
核心包检测:
Core package check:
```bash
python3 -c "import pypdf; import pdfplumber; import PIL" 2>/dev/null || echo "MISSING"
```
缺失时告知用户安装:`pip install pypdf pdfplumber Pillow`
If missing, tell the user to install: `pip install pypdf pdfplumber Pillow`
## Output Rule
When you create or modify a .pdf file, you **MUST** tell the user the absolute path of the output file in your response. Example: "文件已保存到:`/path/to/output.pdf`"
When you create or modify a .pdf file, you **MUST** tell the user the absolute path of the output file in your response. Example: "File saved to: `/path/to/output.pdf`"
## Overview

370
skills/pdf/SKILL.zh-CN.md Normal file
View File

@@ -0,0 +1,370 @@
<!-- locale: zh-CN -->
# pdf 技能
## L0一句话摘要
读取、创建、合并、拆分和填写 PDF 文档,支持 OCR 识别和命令行工具。
## L1概述与使用场景
### 能力描述
pdf 是一个**流程型技能Procedural Skill**,提供 PDF 文档的完整处理能力。基于 Python 库pypdf、pdfplumber、reportlab和命令行工具qpdf、pdftotext、pdftk支持文本提取、表格提取、合并拆分、旋转、水印、加密、表单填写和 OCR 识别。
### 使用场景
- 用户需要从 PDF 中提取文本或表格数据
- 用户需要合并多个 PDF 或拆分页面
- 用户需要创建新的 PDF 文档
- 用户需要填写 PDF 表单、添加水印或加密
## L2详细规范
## Prerequisites
### Python 3必需
在执行任何 Python 操作之前,先检测 Python 是否可用:
```bash
python3 --version 2>/dev/null || python --version 2>/dev/null
```
如果命令失败Python 不可用),**必须停止并告知用户安装 Python 3**
- **macOS**: `brew install python3` 或从 https://www.python.org/downloads/ 下载
- **Windows**: `winget install Python.Python.3` 或从 python.org 下载(安装时勾选 "Add Python to PATH"
- **Linux (Debian/Ubuntu)**: `sudo apt install python3 python3-pip`
- **Linux (Fedora/RHEL)**: `sudo dnf install python3 python3-pip`
如需更详细的环境配置帮助Python 相关问题加载 `python-runtime` 技能;
其他(系统工具如 poppler / tesseract、容器 / WSL加载 `dev-environment-setup` 技能。
### Python 包依赖
本技能依赖以下 Python 包(按需检测):
- `pypdf` — PDF 基础操作(读取、合并、拆分、旋转)
- `pdfplumber` — 表格提取、带布局的文本提取
- `Pillow` — 图片处理(水印、验证图等)
- `reportlab` — PDF 创建(可选,按需安装)
- `pdf2image` — PDF 转图片(可选,需要 poppler
核心包检测:
```bash
python3 -c "import pypdf; import pdfplumber; import PIL" 2>/dev/null || echo "MISSING"
```
缺失时告知用户安装:`pip install pypdf pdfplumber Pillow`
## Output Rule
When you create or modify a .pdf file, you **MUST** tell the user the absolute path of the output file in your response. Example: "文件已保存到:`/path/to/output.pdf`"
## Overview
This guide covers essential PDF processing operations using Python libraries and command-line tools. For advanced features, JavaScript libraries, and detailed examples, see REFERENCE.md. If you need to fill out a PDF form, read FORMS.md and follow its instructions.
## Quick Start
```python
from pypdf import PdfReader, PdfWriter
# Read a PDF
reader = PdfReader("document.pdf")
print(f"Pages: {len(reader.pages)}")
# Extract text
text = ""
for page in reader.pages:
text += page.extract_text()
```
## Python Libraries
### pypdf - Basic Operations
#### Merge PDFs
```python
from pypdf import PdfWriter, PdfReader
writer = PdfWriter()
for pdf_file in ["doc1.pdf", "doc2.pdf", "doc3.pdf"]:
reader = PdfReader(pdf_file)
for page in reader.pages:
writer.add_page(page)
with open("merged.pdf", "wb") as output:
writer.write(output)
```
#### Split PDF
```python
reader = PdfReader("input.pdf")
for i, page in enumerate(reader.pages):
writer = PdfWriter()
writer.add_page(page)
with open(f"page_{i+1}.pdf", "wb") as output:
writer.write(output)
```
#### Extract Metadata
```python
reader = PdfReader("document.pdf")
meta = reader.metadata
print(f"Title: {meta.title}")
print(f"Author: {meta.author}")
print(f"Subject: {meta.subject}")
print(f"Creator: {meta.creator}")
```
#### Rotate Pages
```python
reader = PdfReader("input.pdf")
writer = PdfWriter()
page = reader.pages[0]
page.rotate(90) # Rotate 90 degrees clockwise
writer.add_page(page)
with open("rotated.pdf", "wb") as output:
writer.write(output)
```
### pdfplumber - Text and Table Extraction
#### Extract Text with Layout
```python
import pdfplumber
with pdfplumber.open("document.pdf") as pdf:
for page in pdf.pages:
text = page.extract_text()
print(text)
```
#### Extract Tables
```python
with pdfplumber.open("document.pdf") as pdf:
for i, page in enumerate(pdf.pages):
tables = page.extract_tables()
for j, table in enumerate(tables):
print(f"Table {j+1} on page {i+1}:")
for row in table:
print(row)
```
#### Advanced Table Extraction
```python
import pandas as pd
with pdfplumber.open("document.pdf") as pdf:
all_tables = []
for page in pdf.pages:
tables = page.extract_tables()
for table in tables:
if table: # Check if table is not empty
df = pd.DataFrame(table[1:], columns=table[0])
all_tables.append(df)
# Combine all tables
if all_tables:
combined_df = pd.concat(all_tables, ignore_index=True)
combined_df.to_excel("extracted_tables.xlsx", index=False)
```
### reportlab - Create PDFs
#### Basic PDF Creation
```python
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas
c = canvas.Canvas("hello.pdf", pagesize=letter)
width, height = letter
# Add text
c.drawString(100, height - 100, "Hello World!")
c.drawString(100, height - 120, "This is a PDF created with reportlab")
# Add a line
c.line(100, height - 140, 400, height - 140)
# Save
c.save()
```
#### Create PDF with Multiple Pages
```python
from reportlab.lib.pagesizes import letter
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, PageBreak
from reportlab.lib.styles import getSampleStyleSheet
doc = SimpleDocTemplate("report.pdf", pagesize=letter)
styles = getSampleStyleSheet()
story = []
# Add content
title = Paragraph("Report Title", styles['Title'])
story.append(title)
story.append(Spacer(1, 12))
body = Paragraph("This is the body of the report. " * 20, styles['Normal'])
story.append(body)
story.append(PageBreak())
# Page 2
story.append(Paragraph("Page 2", styles['Heading1']))
story.append(Paragraph("Content for page 2", styles['Normal']))
# Build PDF
doc.build(story)
```
#### Subscripts and Superscripts
**IMPORTANT**: Never use Unicode subscript/superscript characters (₀₁₂₃₄₅₆₇₈₉, ⁰¹²³⁴⁵⁶⁷⁸⁹) in ReportLab PDFs. The built-in fonts do not include these glyphs, causing them to render as solid black boxes.
Instead, use ReportLab's XML markup tags in Paragraph objects:
```python
from reportlab.platypus import Paragraph
from reportlab.lib.styles import getSampleStyleSheet
styles = getSampleStyleSheet()
# Subscripts: use <sub> tag
chemical = Paragraph("H<sub>2</sub>O", styles['Normal'])
# Superscripts: use <super> tag
squared = Paragraph("x<super>2</super> + y<super>2</super>", styles['Normal'])
```
For canvas-drawn text (not Paragraph objects), manually adjust font the size and position rather than using Unicode subscripts/superscripts.
## Command-Line Tools
### pdftotext (poppler-utils)
```bash
# Extract text
pdftotext input.pdf output.txt
# Extract text preserving layout
pdftotext -layout input.pdf output.txt
# Extract specific pages
pdftotext -f 1 -l 5 input.pdf output.txt # Pages 1-5
```
### qpdf
```bash
# Merge PDFs
qpdf --empty --pages file1.pdf file2.pdf -- merged.pdf
# Split pages
qpdf input.pdf --pages . 1-5 -- pages1-5.pdf
qpdf input.pdf --pages . 6-10 -- pages6-10.pdf
# Rotate pages
qpdf input.pdf output.pdf --rotate=+90:1 # Rotate page 1 by 90 degrees
# Remove password
qpdf --password=mypassword --decrypt encrypted.pdf decrypted.pdf
```
### pdftk (if available)
```bash
# Merge
pdftk file1.pdf file2.pdf cat output merged.pdf
# Split
pdftk input.pdf burst
# Rotate
pdftk input.pdf rotate 1east output rotated.pdf
```
## Common Tasks
### Extract Text from Scanned PDFs
```python
# Requires: pip install pytesseract pdf2image
import pytesseract
from pdf2image import convert_from_path
# Convert PDF to images
images = convert_from_path('scanned.pdf')
# OCR each page
text = ""
for i, image in enumerate(images):
text += f"Page {i+1}:\n"
text += pytesseract.image_to_string(image)
text += "\n\n"
print(text)
```
### Add Watermark
```python
from pypdf import PdfReader, PdfWriter
# Create watermark (or load existing)
watermark = PdfReader("watermark.pdf").pages[0]
# Apply to all pages
reader = PdfReader("document.pdf")
writer = PdfWriter()
for page in reader.pages:
page.merge_page(watermark)
writer.add_page(page)
with open("watermarked.pdf", "wb") as output:
writer.write(output)
```
### Extract Images
```bash
# Using pdfimages (poppler-utils)
pdfimages -j input.pdf output_prefix
# This extracts all images as output_prefix-000.jpg, output_prefix-001.jpg, etc.
```
### Password Protection
```python
from pypdf import PdfReader, PdfWriter
reader = PdfReader("input.pdf")
writer = PdfWriter()
for page in reader.pages:
writer.add_page(page)
# Add password
writer.encrypt("userpassword", "ownerpassword")
with open("encrypted.pdf", "wb") as output:
writer.write(output)
```
## Quick Reference
| Task | Best Tool | Command/Code |
|------|-----------|--------------|
| Merge PDFs | pypdf | `writer.add_page(page)` |
| Split PDFs | pypdf | One page per file |
| Extract text | pdfplumber | `page.extract_text()` |
| Extract tables | pdfplumber | `page.extract_tables()` |
| Create PDFs | reportlab | Canvas or Platypus |
| Command line merge | qpdf | `qpdf --empty --pages ...` |
| OCR scanned PDFs | pytesseract | Convert to image first |
| Fill PDF forms | pdf-lib or pypdf (see FORMS.md) | See FORMS.md |
## Next Steps
- For advanced pypdfium2 usage, see REFERENCE.md
- For JavaScript libraries (pdf-lib), see REFERENCE.md
- If you need to fill out a PDF form, follow the instructions in FORMS.md
- For troubleshooting guides, see REFERENCE.md