--- name: pdf description: >- Use this skill whenever the user wants to do anything with PDF files. This includes reading or extracting text/tables from PDFs, combining or merging multiple PDFs into one, splitting PDFs apart, rotating pages, adding watermarks, creating new PDFs, filling PDF forms, encrypting/decrypting PDFs, extracting images, and OCR on scanned PDFs to make them searchable. If the user mentions a .pdf file or asks to produce one, use this skill. Use when 用户提到 PDF、读取PDF、合并PDF、拆分PDF、填写表单、加水印、提取文字、 扫描识别。 version: 1.0.2 type: procedural risk_level: low status: enabled disable-model-invocation: true tags: - pdf - document - form - ocr metadata: author: anthropic updated_at: '2026-04-13' i18n: default_locale: en-US source_locale: zh-CN locales: - zh-CN - en-US zh-CN: name: PDF 文档处理 short_desc: 读取、创建、合并、拆分和填写 PDF 文档 description: >- Use this skill whenever the user wants to do anything with PDF files. This includes reading or extracting text/tables from PDFs, combining or merging multiple PDFs into one, splitting PDFs apart, rotating pages, adding watermarks, creating new PDFs, filling PDF forms, encrypting/decrypting PDFs, extracting images, and OCR on scanned PDFs to make them searchable. If the user mentions a .pdf file or asks to produce one, use this skill. Use when 用户提到 PDF、读取PDF、合并PDF、拆分PDF、填写表单、加水印、提取文字、 扫描识别。 body: ./SKILL.zh-CN.md source_hash: sha256:15805c1921ac2c1e translated_by: human en-US: name: PDF Document Processing short_desc: Read, create, merge, split, and fill PDF documents description: >- Use this skill whenever the user wants to do anything with PDF files. This includes reading or extracting text/tables from PDFs, combining or merging multiple PDFs into one, splitting PDFs apart, rotating pages, adding watermarks, creating new PDFs, filling PDF forms, encrypting/decrypting PDFs, extracting images, and OCR on scanned PDFs to make them searchable. If the user mentions a .pdf file or asks to produce one, use this skill. Use when the user mentions PDF, reading PDFs, merging PDFs, splitting PDFs, filling forms, adding watermarks, extracting text, or OCR. body: ./SKILL.md source_hash: sha256:15805c1921ac2c1e translated_by: ai:claude-opus-4-7 translated_at: '2026-05-03' market: icon: >- category: productivity maintainer: name: DesireCore Official verified: true channel: latest --- # pdf skill ## L0: One-Sentence Summary Read, create, merge, split, and fill PDF documents, with OCR support and command-line tools. ## L1: Overview and Use Cases ### Capability Description pdf is a **Procedural Skill** that provides full PDF document processing capabilities. Built on Python libraries (pypdf, pdfplumber, reportlab) and command-line tools (qpdf, pdftotext, pdftk), it supports text extraction, table extraction, merging/splitting, rotation, watermarking, encryption, form filling, and OCR. ### Use Cases - The user needs to extract text or table data from a PDF - The user needs to merge multiple PDFs or split pages - The user needs to create a new PDF document - The user needs to fill PDF forms, add watermarks, or encrypt PDFs ## L2: Detailed Specification ## Prerequisites ### Python 3 (required) Before performing any Python operation, check that Python is available: ```bash python3 --version 2>/dev/null || python --version 2>/dev/null ``` If the command fails (Python is not available), **you must stop and tell the user to install Python 3**: - **macOS**: `brew install python3`, or download from https://www.python.org/downloads/ - **Windows**: `winget install Python.Python.3`, or download from python.org (check "Add Python to PATH" during installation) - **Linux (Debian/Ubuntu)**: `sudo apt install python3 python3-pip` - **Linux (Fedora/RHEL)**: `sudo dnf install python3 python3-pip` For more detailed environment setup help: load the `python-runtime` skill for Python issues; load the `dev-environment-setup` skill for everything else (system tools like poppler / tesseract, containers / WSL). ### Python Package Dependencies This skill depends on the following Python packages (checked on demand): - `pypdf` — Basic PDF operations (read, merge, split, rotate) - `pdfplumber` — Table extraction, layout-aware text extraction - `Pillow` — Image processing (watermarks, verification images, etc.) - `reportlab` — PDF creation (optional, install on demand) - `pdf2image` — PDF-to-image conversion (optional, requires poppler) Core package check: ```bash python3 -c "import pypdf; import pdfplumber; import PIL" 2>/dev/null || echo "MISSING" ``` If missing, tell the user to install: `pip install pypdf pdfplumber Pillow` ## Output Rule When you create or modify a .pdf file, you **MUST** tell the user the absolute path of the output file in your response. Example: "File saved to: `/path/to/output.pdf`" ## Overview This guide covers essential PDF processing operations using Python libraries and command-line tools. For advanced features, JavaScript libraries, and detailed examples, see REFERENCE.md. If you need to fill out a PDF form, read FORMS.md and follow its instructions. ## Quick Start ```python from pypdf import PdfReader, PdfWriter # Read a PDF reader = PdfReader("document.pdf") print(f"Pages: {len(reader.pages)}") # Extract text text = "" for page in reader.pages: text += page.extract_text() ``` ## Python Libraries ### pypdf - Basic Operations #### Merge PDFs ```python from pypdf import PdfWriter, PdfReader writer = PdfWriter() for pdf_file in ["doc1.pdf", "doc2.pdf", "doc3.pdf"]: reader = PdfReader(pdf_file) for page in reader.pages: writer.add_page(page) with open("merged.pdf", "wb") as output: writer.write(output) ``` #### Split PDF ```python reader = PdfReader("input.pdf") for i, page in enumerate(reader.pages): writer = PdfWriter() writer.add_page(page) with open(f"page_{i+1}.pdf", "wb") as output: writer.write(output) ``` #### Extract Metadata ```python reader = PdfReader("document.pdf") meta = reader.metadata print(f"Title: {meta.title}") print(f"Author: {meta.author}") print(f"Subject: {meta.subject}") print(f"Creator: {meta.creator}") ``` #### Rotate Pages ```python reader = PdfReader("input.pdf") writer = PdfWriter() page = reader.pages[0] page.rotate(90) # Rotate 90 degrees clockwise writer.add_page(page) with open("rotated.pdf", "wb") as output: writer.write(output) ``` ### pdfplumber - Text and Table Extraction #### Extract Text with Layout ```python import pdfplumber with pdfplumber.open("document.pdf") as pdf: for page in pdf.pages: text = page.extract_text() print(text) ``` #### Extract Tables ```python with pdfplumber.open("document.pdf") as pdf: for i, page in enumerate(pdf.pages): tables = page.extract_tables() for j, table in enumerate(tables): print(f"Table {j+1} on page {i+1}:") for row in table: print(row) ``` #### Advanced Table Extraction ```python import pandas as pd with pdfplumber.open("document.pdf") as pdf: all_tables = [] for page in pdf.pages: tables = page.extract_tables() for table in tables: if table: # Check if table is not empty df = pd.DataFrame(table[1:], columns=table[0]) all_tables.append(df) # Combine all tables if all_tables: combined_df = pd.concat(all_tables, ignore_index=True) combined_df.to_excel("extracted_tables.xlsx", index=False) ``` ### reportlab - Create PDFs #### Basic PDF Creation ```python from reportlab.lib.pagesizes import letter from reportlab.pdfgen import canvas c = canvas.Canvas("hello.pdf", pagesize=letter) width, height = letter # Add text c.drawString(100, height - 100, "Hello World!") c.drawString(100, height - 120, "This is a PDF created with reportlab") # Add a line c.line(100, height - 140, 400, height - 140) # Save c.save() ``` #### Create PDF with Multiple Pages ```python from reportlab.lib.pagesizes import letter from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, PageBreak from reportlab.lib.styles import getSampleStyleSheet doc = SimpleDocTemplate("report.pdf", pagesize=letter) styles = getSampleStyleSheet() story = [] # Add content title = Paragraph("Report Title", styles['Title']) story.append(title) story.append(Spacer(1, 12)) body = Paragraph("This is the body of the report. " * 20, styles['Normal']) story.append(body) story.append(PageBreak()) # Page 2 story.append(Paragraph("Page 2", styles['Heading1'])) story.append(Paragraph("Content for page 2", styles['Normal'])) # Build PDF doc.build(story) ``` #### Subscripts and Superscripts **IMPORTANT**: Never use Unicode subscript/superscript characters (₀₁₂₃₄₅₆₇₈₉, ⁰¹²³⁴⁵⁶⁷⁸⁹) in ReportLab PDFs. The built-in fonts do not include these glyphs, causing them to render as solid black boxes. Instead, use ReportLab's XML markup tags in Paragraph objects: ```python from reportlab.platypus import Paragraph from reportlab.lib.styles import getSampleStyleSheet styles = getSampleStyleSheet() # Subscripts: use tag chemical = Paragraph("H2O", styles['Normal']) # Superscripts: use tag squared = Paragraph("x2 + y2", styles['Normal']) ``` For canvas-drawn text (not Paragraph objects), manually adjust font the size and position rather than using Unicode subscripts/superscripts. ## Command-Line Tools ### pdftotext (poppler-utils) ```bash # Extract text pdftotext input.pdf output.txt # Extract text preserving layout pdftotext -layout input.pdf output.txt # Extract specific pages pdftotext -f 1 -l 5 input.pdf output.txt # Pages 1-5 ``` ### qpdf ```bash # Merge PDFs qpdf --empty --pages file1.pdf file2.pdf -- merged.pdf # Split pages qpdf input.pdf --pages . 1-5 -- pages1-5.pdf qpdf input.pdf --pages . 6-10 -- pages6-10.pdf # Rotate pages qpdf input.pdf output.pdf --rotate=+90:1 # Rotate page 1 by 90 degrees # Remove password qpdf --password=mypassword --decrypt encrypted.pdf decrypted.pdf ``` ### pdftk (if available) ```bash # Merge pdftk file1.pdf file2.pdf cat output merged.pdf # Split pdftk input.pdf burst # Rotate pdftk input.pdf rotate 1east output rotated.pdf ``` ## Common Tasks ### Extract Text from Scanned PDFs ```python # Requires: pip install pytesseract pdf2image import pytesseract from pdf2image import convert_from_path # Convert PDF to images images = convert_from_path('scanned.pdf') # OCR each page text = "" for i, image in enumerate(images): text += f"Page {i+1}:\n" text += pytesseract.image_to_string(image) text += "\n\n" print(text) ``` ### Add Watermark ```python from pypdf import PdfReader, PdfWriter # Create watermark (or load existing) watermark = PdfReader("watermark.pdf").pages[0] # Apply to all pages reader = PdfReader("document.pdf") writer = PdfWriter() for page in reader.pages: page.merge_page(watermark) writer.add_page(page) with open("watermarked.pdf", "wb") as output: writer.write(output) ``` ### Extract Images ```bash # Using pdfimages (poppler-utils) pdfimages -j input.pdf output_prefix # This extracts all images as output_prefix-000.jpg, output_prefix-001.jpg, etc. ``` ### Password Protection ```python from pypdf import PdfReader, PdfWriter reader = PdfReader("input.pdf") writer = PdfWriter() for page in reader.pages: writer.add_page(page) # Add password writer.encrypt("userpassword", "ownerpassword") with open("encrypted.pdf", "wb") as output: writer.write(output) ``` ## Quick Reference | Task | Best Tool | Command/Code | |------|-----------|--------------| | Merge PDFs | pypdf | `writer.add_page(page)` | | Split PDFs | pypdf | One page per file | | Extract text | pdfplumber | `page.extract_text()` | | Extract tables | pdfplumber | `page.extract_tables()` | | Create PDFs | reportlab | Canvas or Platypus | | Command line merge | qpdf | `qpdf --empty --pages ...` | | OCR scanned PDFs | pytesseract | Convert to image first | | Fill PDF forms | pdf-lib or pypdf (see FORMS.md) | See FORMS.md | ## Next Steps - For advanced pypdfium2 usage, see REFERENCE.md - For JavaScript libraries (pdf-lib), see REFERENCE.md - If you need to fill out a PDF form, follow the instructions in FORMS.md - For troubleshooting guides, see REFERENCE.md