mirror of
https://git.openapi.site/https://github.com/desirecore/market.git
synced 2026-06-06 08:30:42 +08:00
feat: skills i18n 改造(schemaVersion 1.1,零向后兼容) (#1)
* feat: skills i18n 改造 — schemaVersion 1.1,零向后兼容
把 21 个 skills + 1 个 agent + manifest/categories 全量迁移到 schemaVersion 1.1
的 i18n 结构,配套 CI AI 翻译流水线(GitHub Models)与本地工具链。
## 关键变更
### 数据结构(破坏性,schemaVersion 1.0 → 1.1)
- SKILL.md: 顶层 name 改为 ASCII slug(== 目录名,符合 agentskills.io 规范);
中文显示名/short_desc/description 全部迁入 metadata.i18n.<locale>
- agents/<id>/agent.json: shortDesc/fullDesc/tags/persona.{role,traits} 迁入
i18n.<locale>;changelog[].changes 改为 { <locale>: string[] } 对象
- categories.json: 每个分类的 label/description 迁入 i18n.<locale>,顶层只剩
color/icon
- manifest.json: 加 supportedLocales / defaultLocale;顶层 description 迁入
i18n.<locale>
### Body 文件结构
- 根 SKILL.md = frontmatter + default_locale (en-US) body
- SKILL.<locale>.md = 各 locale 的 markdown body(首行 <!-- locale: xx --> 自校验)
### 工具链(scripts/i18n/)
- glossary.json: zh→en 术语表 + do_not_translate 白名单
- schema/skill-frontmatter.schema.json: i18n frontmatter JSON Schema
- validate-i18n.py: 8 条校验规则(name 合规 / locale 完整性 / hash 一致性等)
- translate.py: GitHub Models / Anthropic 双 backend,sha256 增量翻译
- migrate.py: 一次性迁移脚本(旧格式 → i18n 结构)
### CI(.github/workflows/)
- i18n-validate.yml: PR 触发跑 validate + translate --check
- i18n-translate.yml: PR 触发用 GitHub Models(默认 openai/gpt-5-mini)翻译缺失
locale,自动追加 commit;可切到 ANTHROPIC_API_KEY 走 Claude
### 文档
- docs/I18N.md: 作者贡献指南(schema 说明 / 提交流程 / 常见问题)
- README.md: 加多语言段落
## 验证
- uv run scripts/i18n/validate-i18n.py: OK,49 文件 0 错误
- uv run scripts/i18n/translate.py --check: 0 stale locale
- 21 skills 标题数 zh-CN == en-US 严格对齐(最大 66=66)
- skills-ref 规范校验:全部通过(顶层 name ASCII slug + description 单字段)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(i18n): 修复 PR #1 review 反馈的 6 项问题
- schema: translated_by 正则放宽为 ^(human|ai:[A-Za-z0-9._:/-]+)$,接受
'ai:github:openai/gpt-5-mini' 这类 backend:model 形式(CI 翻译输出格式)
- README + docs/I18N.md: 修正"CI 用 Claude API"误导描述,正确说明默认是
GitHub Models(openai/gpt-5-mini)+ GITHUB_TOKEN,可选切到 Anthropic
- skills/minimax-tts/SKILL.md & SKILL.zh-CN.md: 删除多余的 ``` 闭合,避免
Markdown 后续渲染错乱
- skills/docx/SKILL.md: 翻译时丢失的 • Unicode escape 示例已恢复,
与 zh-CN 版本对齐
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
534
skills/docx/SKILL.zh-CN.md
Normal file
534
skills/docx/SKILL.zh-CN.md
Normal file
@@ -0,0 +1,534 @@
|
||||
<!-- locale: zh-CN -->
|
||||
|
||||
# docx 技能
|
||||
|
||||
## L0:一句话摘要
|
||||
|
||||
创建、编辑和处理 Word 文档(.docx),支持新建、修改 XML、格式校验全流程。
|
||||
|
||||
## L1:概述与使用场景
|
||||
|
||||
### 能力描述
|
||||
|
||||
docx 是一个**流程型技能(Procedural Skill)**,提供 Word 文档的完整处理能力。支持通过 docx-js(Node.js)创建新文档,通过解包 XML 编辑现有文档,以及格式验证和 PDF 转换。
|
||||
|
||||
### 使用场景
|
||||
|
||||
- 用户需要创建新的 Word 文档(报告、备忘录、合同、信函等)
|
||||
- 用户需要编辑现有 .docx 文件(修改内容、添加批注、跟踪修改)
|
||||
- 用户需要从 .docx 文件中提取文本或表格数据
|
||||
- 用户需要进行文档格式转换(.doc → .docx、.docx → PDF)
|
||||
|
||||
## L2:详细规范
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### Python 3(必需)
|
||||
|
||||
在执行任何 Python 脚本之前,先检测 Python 是否可用:
|
||||
|
||||
```bash
|
||||
python3 --version 2>/dev/null || python --version 2>/dev/null
|
||||
```
|
||||
|
||||
如果命令失败(Python 不可用),**必须停止并告知用户安装 Python 3**:
|
||||
|
||||
- **macOS**: `brew install python3` 或从 https://www.python.org/downloads/ 下载
|
||||
- **Windows**: `winget install Python.Python.3` 或从 python.org 下载(安装时勾选 "Add Python to PATH")
|
||||
- **Linux (Debian/Ubuntu)**: `sudo apt install python3 python3-pip`
|
||||
- **Linux (Fedora/RHEL)**: `sudo dnf install python3 python3-pip`
|
||||
|
||||
如需更详细的环境配置帮助:Python 相关问题加载 `python-runtime` 技能;
|
||||
其他(容器 / WSL / 系统工具)加载 `dev-environment-setup` 技能。
|
||||
|
||||
### Python 包依赖
|
||||
|
||||
本技能的 Python 脚本依赖以下包(按需检测,仅在实际调用相关脚本时检查):
|
||||
|
||||
- `lxml` — XML schema 验证(validate.py)
|
||||
- `defusedxml` — 安全 XML 解析(unpack.py)
|
||||
|
||||
检测方法:
|
||||
```bash
|
||||
python3 -c "import lxml; import defusedxml" 2>/dev/null || echo "MISSING"
|
||||
```
|
||||
|
||||
缺失时告知用户安装:`pip install lxml defusedxml`
|
||||
|
||||
## Output Rule
|
||||
|
||||
When you create or modify a .docx file, you **MUST** tell the user the absolute path of the output file in your response. Example: "文件已保存到:`/path/to/output.docx`"
|
||||
|
||||
## Overview
|
||||
|
||||
A .docx file is a ZIP archive containing XML files.
|
||||
|
||||
## Quick Reference
|
||||
|
||||
| Task | Approach |
|
||||
|------|----------|
|
||||
| Read/analyze content | `pandoc` or unpack for raw XML |
|
||||
| Create new document | Use `docx-js` - see Creating New Documents below |
|
||||
| Edit existing document | Unpack → edit XML → repack - see Editing Existing Documents below |
|
||||
|
||||
### Converting .doc to .docx
|
||||
|
||||
Legacy `.doc` files must be converted before editing:
|
||||
|
||||
```bash
|
||||
python scripts/office/soffice.py --headless --convert-to docx document.doc
|
||||
```
|
||||
|
||||
### Reading Content
|
||||
|
||||
```bash
|
||||
# Text extraction with tracked changes
|
||||
pandoc --track-changes=all document.docx -o output.md
|
||||
|
||||
# Raw XML access
|
||||
python scripts/office/unpack.py document.docx unpacked/
|
||||
```
|
||||
|
||||
### Converting to Images
|
||||
|
||||
```bash
|
||||
python scripts/office/soffice.py --headless --convert-to pdf document.docx
|
||||
pdftoppm -jpeg -r 150 document.pdf page
|
||||
```
|
||||
|
||||
### Accepting Tracked Changes
|
||||
|
||||
To produce a clean document with all tracked changes accepted (requires LibreOffice):
|
||||
|
||||
```bash
|
||||
python scripts/accept_changes.py input.docx output.docx
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Creating New Documents
|
||||
|
||||
Generate .docx files with JavaScript, then validate. Install: `npm install -g docx`
|
||||
|
||||
### Setup
|
||||
```javascript
|
||||
const { Document, Packer, Paragraph, TextRun, Table, TableRow, TableCell, ImageRun,
|
||||
Header, Footer, AlignmentType, PageOrientation, LevelFormat, ExternalHyperlink,
|
||||
TableOfContents, HeadingLevel, BorderStyle, WidthType, ShadingType,
|
||||
VerticalAlign, PageNumber, PageBreak } = require('docx');
|
||||
|
||||
const doc = new Document({ sections: [{ children: [/* content */] }] });
|
||||
Packer.toBuffer(doc).then(buffer => fs.writeFileSync("doc.docx", buffer));
|
||||
```
|
||||
|
||||
### Validation
|
||||
After creating the file, validate it. If validation fails, unpack, fix the XML, and repack.
|
||||
```bash
|
||||
python scripts/office/validate.py doc.docx
|
||||
```
|
||||
|
||||
### Page Size
|
||||
|
||||
```javascript
|
||||
// CRITICAL: docx-js defaults to A4, not US Letter
|
||||
// Always set page size explicitly for consistent results
|
||||
sections: [{
|
||||
properties: {
|
||||
page: {
|
||||
size: {
|
||||
width: 12240, // 8.5 inches in DXA
|
||||
height: 15840 // 11 inches in DXA
|
||||
},
|
||||
margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 } // 1 inch margins
|
||||
}
|
||||
},
|
||||
children: [/* content */]
|
||||
}]
|
||||
```
|
||||
|
||||
**Common page sizes (DXA units, 1440 DXA = 1 inch):**
|
||||
|
||||
| Paper | Width | Height | Content Width (1" margins) |
|
||||
|-------|-------|--------|---------------------------|
|
||||
| US Letter | 12,240 | 15,840 | 9,360 |
|
||||
| A4 (default) | 11,906 | 16,838 | 9,026 |
|
||||
|
||||
**Landscape orientation:** docx-js swaps width/height internally, so pass portrait dimensions and let it handle the swap:
|
||||
```javascript
|
||||
size: {
|
||||
width: 12240, // Pass SHORT edge as width
|
||||
height: 15840, // Pass LONG edge as height
|
||||
orientation: PageOrientation.LANDSCAPE // docx-js swaps them in the XML
|
||||
},
|
||||
// Content width = 15840 - left margin - right margin (uses the long edge)
|
||||
```
|
||||
|
||||
### Styles (Override Built-in Headings)
|
||||
|
||||
Use Arial as the default font (universally supported). Keep titles black for readability.
|
||||
|
||||
```javascript
|
||||
const doc = new Document({
|
||||
styles: {
|
||||
default: { document: { run: { font: "Arial", size: 24 } } }, // 12pt default
|
||||
paragraphStyles: [
|
||||
// IMPORTANT: Use exact IDs to override built-in styles
|
||||
{ id: "Heading1", name: "Heading 1", basedOn: "Normal", next: "Normal", quickFormat: true,
|
||||
run: { size: 32, bold: true, font: "Arial" },
|
||||
paragraph: { spacing: { before: 240, after: 240 }, outlineLevel: 0 } }, // outlineLevel required for TOC
|
||||
{ id: "Heading2", name: "Heading 2", basedOn: "Normal", next: "Normal", quickFormat: true,
|
||||
run: { size: 28, bold: true, font: "Arial" },
|
||||
paragraph: { spacing: { before: 180, after: 180 }, outlineLevel: 1 } },
|
||||
]
|
||||
},
|
||||
sections: [{
|
||||
children: [
|
||||
new Paragraph({ heading: HeadingLevel.HEADING_1, children: [new TextRun("Title")] }),
|
||||
]
|
||||
}]
|
||||
});
|
||||
```
|
||||
|
||||
### Lists (NEVER use unicode bullets)
|
||||
|
||||
```javascript
|
||||
// ❌ WRONG - never manually insert bullet characters
|
||||
new Paragraph({ children: [new TextRun("• Item")] }) // BAD
|
||||
new Paragraph({ children: [new TextRun("\u2022 Item")] }) // BAD
|
||||
|
||||
// ✅ CORRECT - use numbering config with LevelFormat.BULLET
|
||||
const doc = new Document({
|
||||
numbering: {
|
||||
config: [
|
||||
{ reference: "bullets",
|
||||
levels: [{ level: 0, format: LevelFormat.BULLET, text: "•", alignment: AlignmentType.LEFT,
|
||||
style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] },
|
||||
{ reference: "numbers",
|
||||
levels: [{ level: 0, format: LevelFormat.DECIMAL, text: "%1.", alignment: AlignmentType.LEFT,
|
||||
style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] },
|
||||
]
|
||||
},
|
||||
sections: [{
|
||||
children: [
|
||||
new Paragraph({ numbering: { reference: "bullets", level: 0 },
|
||||
children: [new TextRun("Bullet item")] }),
|
||||
new Paragraph({ numbering: { reference: "numbers", level: 0 },
|
||||
children: [new TextRun("Numbered item")] }),
|
||||
]
|
||||
}]
|
||||
});
|
||||
|
||||
// ⚠️ Each reference creates INDEPENDENT numbering
|
||||
// Same reference = continues (1,2,3 then 4,5,6)
|
||||
// Different reference = restarts (1,2,3 then 1,2,3)
|
||||
```
|
||||
|
||||
### Tables
|
||||
|
||||
**CRITICAL: Tables need dual widths** - set both `columnWidths` on the table AND `width` on each cell. Without both, tables render incorrectly on some platforms.
|
||||
|
||||
```javascript
|
||||
// CRITICAL: Always set table width for consistent rendering
|
||||
// CRITICAL: Use ShadingType.CLEAR (not SOLID) to prevent black backgrounds
|
||||
const border = { style: BorderStyle.SINGLE, size: 1, color: "CCCCCC" };
|
||||
const borders = { top: border, bottom: border, left: border, right: border };
|
||||
|
||||
new Table({
|
||||
width: { size: 9360, type: WidthType.DXA }, // Always use DXA (percentages break in Google Docs)
|
||||
columnWidths: [4680, 4680], // Must sum to table width (DXA: 1440 = 1 inch)
|
||||
rows: [
|
||||
new TableRow({
|
||||
children: [
|
||||
new TableCell({
|
||||
borders,
|
||||
width: { size: 4680, type: WidthType.DXA }, // Also set on each cell
|
||||
shading: { fill: "D5E8F0", type: ShadingType.CLEAR }, // CLEAR not SOLID
|
||||
margins: { top: 80, bottom: 80, left: 120, right: 120 }, // Cell padding (internal, not added to width)
|
||||
children: [new Paragraph({ children: [new TextRun("Cell")] })]
|
||||
})
|
||||
]
|
||||
})
|
||||
]
|
||||
})
|
||||
```
|
||||
|
||||
**Table width calculation:**
|
||||
|
||||
Always use `WidthType.DXA` — `WidthType.PERCENTAGE` breaks in Google Docs.
|
||||
|
||||
```javascript
|
||||
// Table width = sum of columnWidths = content width
|
||||
// US Letter with 1" margins: 12240 - 2880 = 9360 DXA
|
||||
width: { size: 9360, type: WidthType.DXA },
|
||||
columnWidths: [7000, 2360] // Must sum to table width
|
||||
```
|
||||
|
||||
**Width rules:**
|
||||
- **Always use `WidthType.DXA`** — never `WidthType.PERCENTAGE` (incompatible with Google Docs)
|
||||
- Table width must equal the sum of `columnWidths`
|
||||
- Cell `width` must match corresponding `columnWidth`
|
||||
- Cell `margins` are internal padding - they reduce content area, not add to cell width
|
||||
- For full-width tables: use content width (page width minus left and right margins)
|
||||
|
||||
### Images
|
||||
|
||||
```javascript
|
||||
// CRITICAL: type parameter is REQUIRED
|
||||
new Paragraph({
|
||||
children: [new ImageRun({
|
||||
type: "png", // Required: png, jpg, jpeg, gif, bmp, svg
|
||||
data: fs.readFileSync("image.png"),
|
||||
transformation: { width: 200, height: 150 },
|
||||
altText: { title: "Title", description: "Desc", name: "Name" } // All three required
|
||||
})]
|
||||
})
|
||||
```
|
||||
|
||||
### Page Breaks
|
||||
|
||||
```javascript
|
||||
// CRITICAL: PageBreak must be inside a Paragraph
|
||||
new Paragraph({ children: [new PageBreak()] })
|
||||
|
||||
// Or use pageBreakBefore
|
||||
new Paragraph({ pageBreakBefore: true, children: [new TextRun("New page")] })
|
||||
```
|
||||
|
||||
### Table of Contents
|
||||
|
||||
```javascript
|
||||
// CRITICAL: Headings must use HeadingLevel ONLY - no custom styles
|
||||
new TableOfContents("Table of Contents", { hyperlink: true, headingStyleRange: "1-3" })
|
||||
```
|
||||
|
||||
### Headers/Footers
|
||||
|
||||
```javascript
|
||||
sections: [{
|
||||
properties: {
|
||||
page: { margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 } } // 1440 = 1 inch
|
||||
},
|
||||
headers: {
|
||||
default: new Header({ children: [new Paragraph({ children: [new TextRun("Header")] })] })
|
||||
},
|
||||
footers: {
|
||||
default: new Footer({ children: [new Paragraph({
|
||||
children: [new TextRun("Page "), new TextRun({ children: [PageNumber.CURRENT] })]
|
||||
})] })
|
||||
},
|
||||
children: [/* content */]
|
||||
}]
|
||||
```
|
||||
|
||||
### Critical Rules for docx-js
|
||||
|
||||
- **Set page size explicitly** - docx-js defaults to A4; use US Letter (12240 x 15840 DXA) for US documents
|
||||
- **Landscape: pass portrait dimensions** - docx-js swaps width/height internally; pass short edge as `width`, long edge as `height`, and set `orientation: PageOrientation.LANDSCAPE`
|
||||
- **Never use `\n`** - use separate Paragraph elements
|
||||
- **Never use unicode bullets** - use `LevelFormat.BULLET` with numbering config
|
||||
- **PageBreak must be in Paragraph** - standalone creates invalid XML
|
||||
- **ImageRun requires `type`** - always specify png/jpg/etc
|
||||
- **Always set table `width` with DXA** - never use `WidthType.PERCENTAGE` (breaks in Google Docs)
|
||||
- **Tables need dual widths** - `columnWidths` array AND cell `width`, both must match
|
||||
- **Table width = sum of columnWidths** - for DXA, ensure they add up exactly
|
||||
- **Always add cell margins** - use `margins: { top: 80, bottom: 80, left: 120, right: 120 }` for readable padding
|
||||
- **Use `ShadingType.CLEAR`** - never SOLID for table shading
|
||||
- **TOC requires HeadingLevel only** - no custom styles on heading paragraphs
|
||||
- **Override built-in styles** - use exact IDs: "Heading1", "Heading2", etc.
|
||||
- **Include `outlineLevel`** - required for TOC (0 for H1, 1 for H2, etc.)
|
||||
|
||||
---
|
||||
|
||||
## Editing Existing Documents
|
||||
|
||||
**Follow all 3 steps in order.**
|
||||
|
||||
### Step 1: Unpack
|
||||
```bash
|
||||
python scripts/office/unpack.py document.docx unpacked/
|
||||
```
|
||||
Extracts XML, pretty-prints, merges adjacent runs, and converts smart quotes to XML entities (`“` etc.) so they survive editing. Use `--merge-runs false` to skip run merging.
|
||||
|
||||
### Step 2: Edit XML
|
||||
|
||||
Edit files in `unpacked/word/`. See XML Reference below for patterns.
|
||||
|
||||
**Use "Claude" as the author** for tracked changes and comments, unless the user explicitly requests use of a different name.
|
||||
|
||||
**Use the Edit tool directly for string replacement. Do not write Python scripts.** Scripts introduce unnecessary complexity. The Edit tool shows exactly what is being replaced.
|
||||
|
||||
**CRITICAL: Use smart quotes for new content.** When adding text with apostrophes or quotes, use XML entities to produce smart quotes:
|
||||
```xml
|
||||
<!-- Use these entities for professional typography -->
|
||||
<w:t>Here’s a quote: “Hello”</w:t>
|
||||
```
|
||||
| Entity | Character |
|
||||
|--------|-----------|
|
||||
| `‘` | ‘ (left single) |
|
||||
| `’` | ’ (right single / apostrophe) |
|
||||
| `“` | “ (left double) |
|
||||
| `”` | ” (right double) |
|
||||
|
||||
**Adding comments:** Use `comment.py` to handle boilerplate across multiple XML files (text must be pre-escaped XML):
|
||||
```bash
|
||||
python scripts/comment.py unpacked/ 0 "Comment text with & and ’"
|
||||
python scripts/comment.py unpacked/ 1 "Reply text" --parent 0 # reply to comment 0
|
||||
python scripts/comment.py unpacked/ 0 "Text" --author "Custom Author" # custom author name
|
||||
```
|
||||
Then add markers to document.xml (see Comments in XML Reference).
|
||||
|
||||
### Step 3: Pack
|
||||
```bash
|
||||
python scripts/office/pack.py unpacked/ output.docx --original document.docx
|
||||
```
|
||||
Validates with auto-repair, condenses XML, and creates DOCX. Use `--validate false` to skip.
|
||||
|
||||
**Auto-repair will fix:**
|
||||
- `durableId` >= 0x7FFFFFFF (regenerates valid ID)
|
||||
- Missing `xml:space="preserve"` on `<w:t>` with whitespace
|
||||
|
||||
**Auto-repair won't fix:**
|
||||
- Malformed XML, invalid element nesting, missing relationships, schema violations
|
||||
|
||||
### Common Pitfalls
|
||||
|
||||
- **Replace entire `<w:r>` elements**: When adding tracked changes, replace the whole `<w:r>...</w:r>` block with `<w:del>...<w:ins>...` as siblings. Don't inject tracked change tags inside a run.
|
||||
- **Preserve `<w:rPr>` formatting**: Copy the original run's `<w:rPr>` block into your tracked change runs to maintain bold, font size, etc.
|
||||
|
||||
---
|
||||
|
||||
## XML Reference
|
||||
|
||||
### Schema Compliance
|
||||
|
||||
- **Element order in `<w:pPr>`**: `<w:pStyle>`, `<w:numPr>`, `<w:spacing>`, `<w:ind>`, `<w:jc>`, `<w:rPr>` last
|
||||
- **Whitespace**: Add `xml:space="preserve"` to `<w:t>` with leading/trailing spaces
|
||||
- **RSIDs**: Must be 8-digit hex (e.g., `00AB1234`)
|
||||
|
||||
### Tracked Changes
|
||||
|
||||
**Insertion:**
|
||||
```xml
|
||||
<w:ins w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z">
|
||||
<w:r><w:t>inserted text</w:t></w:r>
|
||||
</w:ins>
|
||||
```
|
||||
|
||||
**Deletion:**
|
||||
```xml
|
||||
<w:del w:id="2" w:author="Claude" w:date="2025-01-01T00:00:00Z">
|
||||
<w:r><w:delText>deleted text</w:delText></w:r>
|
||||
</w:del>
|
||||
```
|
||||
|
||||
**Inside `<w:del>`**: Use `<w:delText>` instead of `<w:t>`, and `<w:delInstrText>` instead of `<w:instrText>`.
|
||||
|
||||
**Minimal edits** - only mark what changes:
|
||||
```xml
|
||||
<!-- Change "30 days" to "60 days" -->
|
||||
<w:r><w:t>The term is </w:t></w:r>
|
||||
<w:del w:id="1" w:author="Claude" w:date="...">
|
||||
<w:r><w:delText>30</w:delText></w:r>
|
||||
</w:del>
|
||||
<w:ins w:id="2" w:author="Claude" w:date="...">
|
||||
<w:r><w:t>60</w:t></w:r>
|
||||
</w:ins>
|
||||
<w:r><w:t> days.</w:t></w:r>
|
||||
```
|
||||
|
||||
**Deleting entire paragraphs/list items** - when removing ALL content from a paragraph, also mark the paragraph mark as deleted so it merges with the next paragraph. Add `<w:del/>` inside `<w:pPr><w:rPr>`:
|
||||
```xml
|
||||
<w:p>
|
||||
<w:pPr>
|
||||
<w:numPr>...</w:numPr> <!-- list numbering if present -->
|
||||
<w:rPr>
|
||||
<w:del w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z"/>
|
||||
</w:rPr>
|
||||
</w:pPr>
|
||||
<w:del w:id="2" w:author="Claude" w:date="2025-01-01T00:00:00Z">
|
||||
<w:r><w:delText>Entire paragraph content being deleted...</w:delText></w:r>
|
||||
</w:del>
|
||||
</w:p>
|
||||
```
|
||||
Without the `<w:del/>` in `<w:pPr><w:rPr>`, accepting changes leaves an empty paragraph/list item.
|
||||
|
||||
**Rejecting another author's insertion** - nest deletion inside their insertion:
|
||||
```xml
|
||||
<w:ins w:author="Jane" w:id="5">
|
||||
<w:del w:author="Claude" w:id="10">
|
||||
<w:r><w:delText>their inserted text</w:delText></w:r>
|
||||
</w:del>
|
||||
</w:ins>
|
||||
```
|
||||
|
||||
**Restoring another author's deletion** - add insertion after (don't modify their deletion):
|
||||
```xml
|
||||
<w:del w:author="Jane" w:id="5">
|
||||
<w:r><w:delText>deleted text</w:delText></w:r>
|
||||
</w:del>
|
||||
<w:ins w:author="Claude" w:id="10">
|
||||
<w:r><w:t>deleted text</w:t></w:r>
|
||||
</w:ins>
|
||||
```
|
||||
|
||||
### Comments
|
||||
|
||||
After running `comment.py` (see Step 2), add markers to document.xml. For replies, use `--parent` flag and nest markers inside the parent's.
|
||||
|
||||
**CRITICAL: `<w:commentRangeStart>` and `<w:commentRangeEnd>` are siblings of `<w:r>`, never inside `<w:r>`.**
|
||||
|
||||
```xml
|
||||
<!-- Comment markers are direct children of w:p, never inside w:r -->
|
||||
<w:commentRangeStart w:id="0"/>
|
||||
<w:del w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z">
|
||||
<w:r><w:delText>deleted</w:delText></w:r>
|
||||
</w:del>
|
||||
<w:r><w:t> more text</w:t></w:r>
|
||||
<w:commentRangeEnd w:id="0"/>
|
||||
<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="0"/></w:r>
|
||||
|
||||
<!-- Comment 0 with reply 1 nested inside -->
|
||||
<w:commentRangeStart w:id="0"/>
|
||||
<w:commentRangeStart w:id="1"/>
|
||||
<w:r><w:t>text</w:t></w:r>
|
||||
<w:commentRangeEnd w:id="1"/>
|
||||
<w:commentRangeEnd w:id="0"/>
|
||||
<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="0"/></w:r>
|
||||
<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="1"/></w:r>
|
||||
```
|
||||
|
||||
### Images
|
||||
|
||||
1. Add image file to `word/media/`
|
||||
2. Add relationship to `word/_rels/document.xml.rels`:
|
||||
```xml
|
||||
<Relationship Id="rId5" Type=".../image" Target="media/image1.png"/>
|
||||
```
|
||||
3. Add content type to `[Content_Types].xml`:
|
||||
```xml
|
||||
<Default Extension="png" ContentType="image/png"/>
|
||||
```
|
||||
4. Reference in document.xml:
|
||||
```xml
|
||||
<w:drawing>
|
||||
<wp:inline>
|
||||
<wp:extent cx="914400" cy="914400"/> <!-- EMUs: 914400 = 1 inch -->
|
||||
<a:graphic>
|
||||
<a:graphicData uri=".../picture">
|
||||
<pic:pic>
|
||||
<pic:blipFill><a:blip r:embed="rId5"/></pic:blipFill>
|
||||
</pic:pic>
|
||||
</a:graphicData>
|
||||
</a:graphic>
|
||||
</wp:inline>
|
||||
</w:drawing>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Dependencies
|
||||
|
||||
- **pandoc**: Text extraction
|
||||
- **docx**: `npm install -g docx` (new documents)
|
||||
- **LibreOffice**: PDF conversion (auto-configured for sandboxed environments via `scripts/office/soffice.py`)
|
||||
- **Poppler**: `pdftoppm` for images
|
||||
Reference in New Issue
Block a user