feat(docx): 跨平台启动器替换 bash 包装,复用预装依赖免每次安装 (#21)

## 概述 / Summary

把 docx 技能对"客户端预装运行时依赖"的复用方式从 **bash 包装脚本**改为**跨平台 runtime 启动器**,实现
Win/macOS/Linux 一致、不依赖 Git Bash,并修复若干 POSIX 硬编码导致的 Windows 崩溃点。

Switch the docx skill's reuse of client-preinstalled runtime deps from a
**bash wrapper** to **cross-platform runtime launchers**, so it behaves
identically on Win/macOS/Linux without Git Bash, and fix several
POSIX-hardcoded crashes on Windows.

## 改动 / Changes

- **新增 / Add** `scripts/preload-deps.cjs`(Node 预加载,注入 `NODE_PATH`)与
`scripts/with-deps.py`(Python 启动器,按需切换到内置含 lxml 的 Python);**删除** bash 版
`with-deps.sh`。
- 生成走 `node -r preload-deps.cjs`,office 脚本走 `python with-deps.py` ——
离线复用预装的 docx-js / defusedxml / lxml,免每次 `npm`/`pip install`,且**不依赖
bash**。
- `comment.py` 补 defusedxml sys.path shim;`validate.py` 修临时目录泄漏(atexit
清理)。
- `accept_changes.py` 去除 `/tmp` 硬编码(`tempfile.gettempdir` +
`Path.as_uri`);`soffice.py` 仅 Linux 启用 AF_UNIX shim,避免 Windows 崩溃。
- `SKILL.md` / `SKILL.zh-CN.md` 同步命令形式、加 ESM
警告与外部工具(pandoc/LibreOffice/poppler)跨平台安装指引,`source_hash` 重算。

## 测试 / Testing

- 真实 dev 根目录端到端:生成 docx(免安装)+ 完整 XSD 校验(含 lxml)+ unpack/pack 往返均通过。
- 仓库 `validate-i18n.py` 校验通过;全 py 脚本 `py_compile` + `preload-deps.cjs`
`node --check` 通过。

---

- [x] 我已阅读并同意 CLA / I have read and agree to the CLA

Co-authored-by: 张馨元 <zhangxy@iynss.com>
Co-authored-by: Yige <a@wyr.me>
This commit is contained in:
Zxy-y
2026-06-04 11:14:36 +08:00
committed by GitHub
parent b15fce19bf
commit 17fe79ab49
10 changed files with 276 additions and 77 deletions

View File

@@ -25,14 +25,16 @@ docx 是一个**流程型技能Procedural Skill**,提供 Word 文档的
本技能自带的 Python 脚本位于技能安装目录内。执行时**必须使用完整路径**,不能使用相对路径。
技能目录由上下文中的 `<skill-dir>` 标签提供。所有 `scripts/` 开头的命令都应拼接为
技能目录由上下文中的 `<skill-dir>` 标签提供。**所有 office Python 脚本都必须通过跨平台启动器** `scripts/with-deps.py` 运行,从而复用运行时预装的库(`defusedxml`,以及完整校验用的 `lxml`),无需 `pip install`。该启动器是纯 Python**macOS / Linux / Windows 上行为一致——不依赖 `bash`**
```bash
python "<skill-dir>/scripts/office/unpack.py" document.docx unpacked/
python "<skill-dir>/scripts/office/pack.py" unpacked/ output.docx
python "<skill-dir>/scripts/with-deps.py" office/unpack.py document.docx unpacked/
python "<skill-dir>/scripts/with-deps.py" office/pack.py unpacked/ output.docx
```
**禁止**直接执行 `python scripts/office/unpack.py`——该相对路径在用户工作目录下不存在
启动器会把目标脚本放到运行时自带、预装了 `lxml`/`defusedxml` 的 Python 下运行(若该自带 Python 不可用则回退系统 `python3`,此时完整 XSD 校验会被优雅跳过)。脚本路径**相对 `scripts/`**(如 `office/unpack.py``comment.py`
**禁止**用裸相对路径执行 office 脚本(如 `python scripts/office/unpack.py`)——该路径在用户工作目录下不存在,且会绕过预装的库。一律通过 `<skill-dir>/scripts/with-deps.py` 运行。
## Prerequisites
@@ -56,17 +58,9 @@ python3 --version 2>/dev/null || python --version 2>/dev/null
### Python 包依赖
本技能的 Python 脚本依赖以下包(按需检测,仅在实际调用相关脚本时检查):
本技能的 Python 脚本依赖 `defusedxml`XML 解析)和 `lxml`XSD 校验)。**两者均由运行时预装** —— 通过 `scripts/with-deps.py` 运行脚本时,目标会在运行时自带、已装好两者的 Python 下运行,因此**无需 `pip install`、无需联网**(离线可用,全平台、不依赖 `bash`)。
- `lxml` — XML schema 验证validate.py
- `defusedxml` — 安全 XML 解析unpack.py
检测方法:
```bash
python3 -c "import lxml; import defusedxml" 2>/dev/null || echo "MISSING"
```
缺失时告知用户安装:`pip install lxml defusedxml`
回退:若运行时自带 Python 不可用(老客户端 / 构建未烤入),启动器回退系统 `python3`。此时 `defusedxml` 仍可解析(纯 Python、单独预装`lxml` 可能缺失 —— 完整 XSD 校验会被**优雅跳过**(编辑/打包仍成功)。要在该回退情形下启用完整校验:`pip install lxml`
## Output Rule
@@ -89,7 +83,7 @@ A .docx file is a ZIP archive containing XML files.
Legacy `.doc` files must be converted before editing:
```bash
python scripts/office/soffice.py --headless --convert-to docx document.doc
python "<skill-dir>/scripts/with-deps.py" office/soffice.py --headless --convert-to docx document.doc
```
### Reading Content
@@ -99,13 +93,13 @@ python scripts/office/soffice.py --headless --convert-to docx document.doc
pandoc --track-changes=all document.docx -o output.md
# Raw XML access
python scripts/office/unpack.py document.docx unpacked/
python "<skill-dir>/scripts/with-deps.py" office/unpack.py document.docx unpacked/
```
### Converting to Images
```bash
python scripts/office/soffice.py --headless --convert-to pdf document.docx
python "<skill-dir>/scripts/with-deps.py" office/soffice.py --headless --convert-to pdf document.docx
pdftoppm -jpeg -r 150 document.pdf page
```
@@ -114,17 +108,28 @@ pdftoppm -jpeg -r 150 document.pdf page
To produce a clean document with all tracked changes accepted (requires LibreOffice):
```bash
python scripts/accept_changes.py input.docx output.docx
python "<skill-dir>/scripts/with-deps.py" accept_changes.py input.docx output.docx
```
---
## Creating New Documents
Generate .docx files with JavaScript, then validate. Install: `npm install -g docx`
Generate .docx files with JavaScript, then validate. The `docx` (docx-js) library is **pre-installed by the runtime** — no `npm install` needed. You MUST run the generator through the Node preloader (`scripts/preload-deps.cjs`, see Run below) so `require('docx')` resolves the pre-installed library.
### 编写脚本(关键 —— 避免引号/转义导致失败)
**必须用 `Write` 工具创建 `generate.js`——直接把文件写出来。**
**禁止**通过 shell 拼脚本:不要用 `bash` heredoc`cat <<EOF`)、`echo``python3 -c "...open(...).write(...)"` 来生成 JavaScript。文档内容尤其手册/报告)含大量 `"` 引号、撇号、中文标点;经 shell 会导致三层引号互相冲突shell 引号 × JS 字符串引号 × heredoc 分隔符),脚本被破坏,进而陷入反复重试和命令超时。
-`Write` 工具 → 内容里的引号原样写入,完全无需 shell 转义。
- 长文档把内容作为普通 JS 字符串/数组放进文件,拆成多个 `Paragraph`;单个文档很大时,分**多次较小的 `Write`/`Edit`** 写入,不要塞进一条巨大命令。
### Setup
用 **Write 工具**把生成脚本写成文件(例如 `generate.js`
```javascript
const fs = require('fs');
const { Document, Packer, Paragraph, TextRun, Table, TableRow, TableCell, ImageRun,
Header, Footer, AlignmentType, PageOrientation, LevelFormat, ExternalHyperlink,
TableOfContents, HeadingLevel, BorderStyle, WidthType, ShadingType,
@@ -134,10 +139,17 @@ const doc = new Document({ sections: [{ children: [/* content */] }] });
Packer.toBuffer(doc).then(buffer => fs.writeFileSync("doc.docx", buffer));
```
### Run
Run it with the cross-platform Node preloader (it injects `NODE_PATH` for the pre-installed `docx`; pure Node, works on **macOS / Linux / Windows**, no `bash` needed):
```bash
node -r "<skill-dir>/scripts/preload-deps.cjs" generate.js
```
**Use CommonJS `require('docx')`, NOT ESM `import`** — the pre-installed library is resolved via `NODE_PATH`, which Node **ignores for ESM**. Do not put a `"type": "module"` `package.json` in the working directory either, as it would force `.js` files to be treated as ESM and break `require`. If `require('docx')` still fails (e.g. the runtime pre-install is unavailable on an older client), fall back to `npm install -g docx` and re-run.
### Validation
After creating the file, validate it. If validation fails, unpack, fix the XML, and repack.
```bash
python scripts/office/validate.py doc.docx
python "<skill-dir>/scripts/with-deps.py" office/validate.py doc.docx
```
### Page Size
@@ -358,7 +370,7 @@ sections: [{
### Step 1: Unpack
```bash
python scripts/office/unpack.py document.docx unpacked/
python "<skill-dir>/scripts/with-deps.py" office/unpack.py document.docx unpacked/
```
Extracts XML, pretty-prints, merges adjacent runs, and converts smart quotes to XML entities (`&#x201C;` etc.) so they survive editing. Use `--merge-runs false` to skip run merging.
@@ -384,15 +396,15 @@ Edit files in `unpacked/word/`. See XML Reference below for patterns.
**Adding comments:** Use `comment.py` to handle boilerplate across multiple XML files (text must be pre-escaped XML):
```bash
python scripts/comment.py unpacked/ 0 "Comment text with &amp; and &#x2019;"
python scripts/comment.py unpacked/ 1 "Reply text" --parent 0 # reply to comment 0
python scripts/comment.py unpacked/ 0 "Text" --author "Custom Author" # custom author name
python "<skill-dir>/scripts/with-deps.py" comment.py unpacked/ 0 "Comment text with &amp; and &#x2019;"
python "<skill-dir>/scripts/with-deps.py" comment.py unpacked/ 1 "Reply text" --parent 0 # reply to comment 0
python "<skill-dir>/scripts/with-deps.py" comment.py unpacked/ 0 "Text" --author "Custom Author" # custom author name
```
Then add markers to document.xml (see Comments in XML Reference).
### Step 3: Pack
```bash
python scripts/office/pack.py unpacked/ output.docx --original document.docx
python "<skill-dir>/scripts/with-deps.py" office/pack.py unpacked/ output.docx --original document.docx
```
Validates with auto-repair, condenses XML, and creates DOCX. Use `--validate false` to skip.
@@ -541,7 +553,13 @@ After running `comment.py` (see Step 2), add markers to document.xml. For replie
## Dependencies
- **pandoc**: Text extraction
- **docx**: `npm install -g docx` (new documents)
- **LibreOffice**: PDF conversion (auto-configured for sandboxed environments via `scripts/office/soffice.py`)
- **Poppler**: `pdftoppm` for images
运行时预装(免安装、离线、跨平台 —— 下列启动器都不需要 `bash`
- **docx** (docx-js): 新建文档 —— **由运行时预装**`runtime-deps/node_modules`);生成器通过 `node -r "<skill-dir>/scripts/preload-deps.cjs" generate.js` 运行,无需 `npm install`
- **defusedxml** + **lxml**: XML 解析与完整 XSD 校验 —— **由运行时预装**在自带 Python 中(`runtime-deps/python-runtime`);脚本通过 `python "<skill-dir>/scripts/with-deps.py" <脚本> ...` 运行即可离线使用。自带 Python / `lxml` 不可用时回退系统 `python3`,并优雅跳过 XSD 校验。
外部系统工具(仅读取/转换需要,**生成不需要**;如使用请按平台安装):
- **pandoc**: 文本提取。macOS `brew install pandoc` · Windows `winget install --id JohnMacFarlane.Pandoc` · Linux `sudo apt install pandoc`
- **LibreOffice**`soffice`: `.doc``.docx`、PDF 转换、接受修订。macOS `brew install --cask libreoffice` · Windows `winget install --id TheDocumentFoundation.LibreOffice`(确保 `soffice``PATH`)· Linux `sudo apt install libreoffice``scripts/office/soffice.py` 里的 Linux 沙箱 `AF_UNIX` shim 在 macOS/Windows 上会自动跳过。
- **Poppler**`pdftoppm`,页→图片): macOS `brew install poppler` · Windows `winget install --id oschwartz10612.Poppler``choco install poppler` · Linux `sudo apt install poppler-utils`