mirror of
https://git.openapi.site/https://github.com/desirecore/market.git
synced 2026-06-06 09:30:42 +08:00
feat(docx): 跨平台启动器替换 bash 包装,复用预装依赖免每次安装 (#21)
## 概述 / Summary 把 docx 技能对"客户端预装运行时依赖"的复用方式从 **bash 包装脚本**改为**跨平台 runtime 启动器**,实现 Win/macOS/Linux 一致、不依赖 Git Bash,并修复若干 POSIX 硬编码导致的 Windows 崩溃点。 Switch the docx skill's reuse of client-preinstalled runtime deps from a **bash wrapper** to **cross-platform runtime launchers**, so it behaves identically on Win/macOS/Linux without Git Bash, and fix several POSIX-hardcoded crashes on Windows. ## 改动 / Changes - **新增 / Add** `scripts/preload-deps.cjs`(Node 预加载,注入 `NODE_PATH`)与 `scripts/with-deps.py`(Python 启动器,按需切换到内置含 lxml 的 Python);**删除** bash 版 `with-deps.sh`。 - 生成走 `node -r preload-deps.cjs`,office 脚本走 `python with-deps.py` —— 离线复用预装的 docx-js / defusedxml / lxml,免每次 `npm`/`pip install`,且**不依赖 bash**。 - `comment.py` 补 defusedxml sys.path shim;`validate.py` 修临时目录泄漏(atexit 清理)。 - `accept_changes.py` 去除 `/tmp` 硬编码(`tempfile.gettempdir` + `Path.as_uri`);`soffice.py` 仅 Linux 启用 AF_UNIX shim,避免 Windows 崩溃。 - `SKILL.md` / `SKILL.zh-CN.md` 同步命令形式、加 ESM 警告与外部工具(pandoc/LibreOffice/poppler)跨平台安装指引,`source_hash` 重算。 ## 测试 / Testing - 真实 dev 根目录端到端:生成 docx(免安装)+ 完整 XSD 校验(含 lxml)+ unpack/pack 往返均通过。 - 仓库 `validate-i18n.py` 校验通过;全 py 脚本 `py_compile` + `preload-deps.cjs` `node --check` 通过。 --- - [x] 我已阅读并同意 CLA / I have read and agree to the CLA Co-authored-by: 张馨元 <zhangxy@iynss.com> Co-authored-by: Yige <a@wyr.me>
This commit is contained in:
@@ -25,14 +25,16 @@ docx 是一个**流程型技能(Procedural Skill)**,提供 Word 文档的
|
||||
|
||||
本技能自带的 Python 脚本位于技能安装目录内。执行时**必须使用完整路径**,不能使用相对路径。
|
||||
|
||||
技能目录由上下文中的 `<skill-dir>` 标签提供。所有 `scripts/` 开头的命令都应拼接为:
|
||||
技能目录由上下文中的 `<skill-dir>` 标签提供。**所有 office Python 脚本都必须通过跨平台启动器** `scripts/with-deps.py` 运行,从而复用运行时预装的库(`defusedxml`,以及完整校验用的 `lxml`),无需 `pip install`。该启动器是纯 Python,在 **macOS / Linux / Windows 上行为一致——不依赖 `bash`**:
|
||||
|
||||
```bash
|
||||
python "<skill-dir>/scripts/office/unpack.py" document.docx unpacked/
|
||||
python "<skill-dir>/scripts/office/pack.py" unpacked/ output.docx
|
||||
python "<skill-dir>/scripts/with-deps.py" office/unpack.py document.docx unpacked/
|
||||
python "<skill-dir>/scripts/with-deps.py" office/pack.py unpacked/ output.docx
|
||||
```
|
||||
|
||||
**禁止**直接执行 `python scripts/office/unpack.py`——该相对路径在用户工作目录下不存在。
|
||||
启动器会把目标脚本放到运行时自带、预装了 `lxml`/`defusedxml` 的 Python 下运行(若该自带 Python 不可用则回退系统 `python3`,此时完整 XSD 校验会被优雅跳过)。脚本路径**相对 `scripts/`**(如 `office/unpack.py`、`comment.py`)。
|
||||
|
||||
**禁止**用裸相对路径执行 office 脚本(如 `python scripts/office/unpack.py`)——该路径在用户工作目录下不存在,且会绕过预装的库。一律通过 `<skill-dir>/scripts/with-deps.py` 运行。
|
||||
|
||||
## Prerequisites
|
||||
|
||||
@@ -56,17 +58,9 @@ python3 --version 2>/dev/null || python --version 2>/dev/null
|
||||
|
||||
### Python 包依赖
|
||||
|
||||
本技能的 Python 脚本依赖以下包(按需检测,仅在实际调用相关脚本时检查):
|
||||
本技能的 Python 脚本依赖 `defusedxml`(XML 解析)和 `lxml`(XSD 校验)。**两者均由运行时预装** —— 通过 `scripts/with-deps.py` 运行脚本时,目标会在运行时自带、已装好两者的 Python 下运行,因此**无需 `pip install`、无需联网**(离线可用,全平台、不依赖 `bash`)。
|
||||
|
||||
- `lxml` — XML schema 验证(validate.py)
|
||||
- `defusedxml` — 安全 XML 解析(unpack.py)
|
||||
|
||||
检测方法:
|
||||
```bash
|
||||
python3 -c "import lxml; import defusedxml" 2>/dev/null || echo "MISSING"
|
||||
```
|
||||
|
||||
缺失时告知用户安装:`pip install lxml defusedxml`
|
||||
回退:若运行时自带 Python 不可用(老客户端 / 构建未烤入),启动器回退系统 `python3`。此时 `defusedxml` 仍可解析(纯 Python、单独预装),但 `lxml` 可能缺失 —— 完整 XSD 校验会被**优雅跳过**(编辑/打包仍成功)。要在该回退情形下启用完整校验:`pip install lxml`。
|
||||
|
||||
## Output Rule
|
||||
|
||||
@@ -89,7 +83,7 @@ A .docx file is a ZIP archive containing XML files.
|
||||
Legacy `.doc` files must be converted before editing:
|
||||
|
||||
```bash
|
||||
python scripts/office/soffice.py --headless --convert-to docx document.doc
|
||||
python "<skill-dir>/scripts/with-deps.py" office/soffice.py --headless --convert-to docx document.doc
|
||||
```
|
||||
|
||||
### Reading Content
|
||||
@@ -99,13 +93,13 @@ python scripts/office/soffice.py --headless --convert-to docx document.doc
|
||||
pandoc --track-changes=all document.docx -o output.md
|
||||
|
||||
# Raw XML access
|
||||
python scripts/office/unpack.py document.docx unpacked/
|
||||
python "<skill-dir>/scripts/with-deps.py" office/unpack.py document.docx unpacked/
|
||||
```
|
||||
|
||||
### Converting to Images
|
||||
|
||||
```bash
|
||||
python scripts/office/soffice.py --headless --convert-to pdf document.docx
|
||||
python "<skill-dir>/scripts/with-deps.py" office/soffice.py --headless --convert-to pdf document.docx
|
||||
pdftoppm -jpeg -r 150 document.pdf page
|
||||
```
|
||||
|
||||
@@ -114,17 +108,28 @@ pdftoppm -jpeg -r 150 document.pdf page
|
||||
To produce a clean document with all tracked changes accepted (requires LibreOffice):
|
||||
|
||||
```bash
|
||||
python scripts/accept_changes.py input.docx output.docx
|
||||
python "<skill-dir>/scripts/with-deps.py" accept_changes.py input.docx output.docx
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Creating New Documents
|
||||
|
||||
Generate .docx files with JavaScript, then validate. Install: `npm install -g docx`
|
||||
Generate .docx files with JavaScript, then validate. The `docx` (docx-js) library is **pre-installed by the runtime** — no `npm install` needed. You MUST run the generator through the Node preloader (`scripts/preload-deps.cjs`, see Run below) so `require('docx')` resolves the pre-installed library.
|
||||
|
||||
### 编写脚本(关键 —— 避免引号/转义导致失败)
|
||||
|
||||
**必须用 `Write` 工具创建 `generate.js`——直接把文件写出来。**
|
||||
|
||||
**禁止**通过 shell 拼脚本:不要用 `bash` heredoc(`cat <<EOF`)、`echo` 或 `python3 -c "...open(...).write(...)"` 来生成 JavaScript。文档内容(尤其手册/报告)含大量 `"` 引号、撇号、中文标点;经 shell 会导致三层引号互相冲突(shell 引号 × JS 字符串引号 × heredoc 分隔符),脚本被破坏,进而陷入反复重试和命令超时。
|
||||
|
||||
- 用 `Write` 工具 → 内容里的引号原样写入,完全无需 shell 转义。
|
||||
- 长文档把内容作为普通 JS 字符串/数组放进文件,拆成多个 `Paragraph`;单个文档很大时,分**多次较小的 `Write`/`Edit`** 写入,不要塞进一条巨大命令。
|
||||
|
||||
### Setup
|
||||
用 **Write 工具**把生成脚本写成文件(例如 `generate.js`):
|
||||
```javascript
|
||||
const fs = require('fs');
|
||||
const { Document, Packer, Paragraph, TextRun, Table, TableRow, TableCell, ImageRun,
|
||||
Header, Footer, AlignmentType, PageOrientation, LevelFormat, ExternalHyperlink,
|
||||
TableOfContents, HeadingLevel, BorderStyle, WidthType, ShadingType,
|
||||
@@ -134,10 +139,17 @@ const doc = new Document({ sections: [{ children: [/* content */] }] });
|
||||
Packer.toBuffer(doc).then(buffer => fs.writeFileSync("doc.docx", buffer));
|
||||
```
|
||||
|
||||
### Run
|
||||
Run it with the cross-platform Node preloader (it injects `NODE_PATH` for the pre-installed `docx`; pure Node, works on **macOS / Linux / Windows**, no `bash` needed):
|
||||
```bash
|
||||
node -r "<skill-dir>/scripts/preload-deps.cjs" generate.js
|
||||
```
|
||||
**Use CommonJS `require('docx')`, NOT ESM `import`** — the pre-installed library is resolved via `NODE_PATH`, which Node **ignores for ESM**. Do not put a `"type": "module"` `package.json` in the working directory either, as it would force `.js` files to be treated as ESM and break `require`. If `require('docx')` still fails (e.g. the runtime pre-install is unavailable on an older client), fall back to `npm install -g docx` and re-run.
|
||||
|
||||
### Validation
|
||||
After creating the file, validate it. If validation fails, unpack, fix the XML, and repack.
|
||||
```bash
|
||||
python scripts/office/validate.py doc.docx
|
||||
python "<skill-dir>/scripts/with-deps.py" office/validate.py doc.docx
|
||||
```
|
||||
|
||||
### Page Size
|
||||
@@ -358,7 +370,7 @@ sections: [{
|
||||
|
||||
### Step 1: Unpack
|
||||
```bash
|
||||
python scripts/office/unpack.py document.docx unpacked/
|
||||
python "<skill-dir>/scripts/with-deps.py" office/unpack.py document.docx unpacked/
|
||||
```
|
||||
Extracts XML, pretty-prints, merges adjacent runs, and converts smart quotes to XML entities (`“` etc.) so they survive editing. Use `--merge-runs false` to skip run merging.
|
||||
|
||||
@@ -384,15 +396,15 @@ Edit files in `unpacked/word/`. See XML Reference below for patterns.
|
||||
|
||||
**Adding comments:** Use `comment.py` to handle boilerplate across multiple XML files (text must be pre-escaped XML):
|
||||
```bash
|
||||
python scripts/comment.py unpacked/ 0 "Comment text with & and ’"
|
||||
python scripts/comment.py unpacked/ 1 "Reply text" --parent 0 # reply to comment 0
|
||||
python scripts/comment.py unpacked/ 0 "Text" --author "Custom Author" # custom author name
|
||||
python "<skill-dir>/scripts/with-deps.py" comment.py unpacked/ 0 "Comment text with & and ’"
|
||||
python "<skill-dir>/scripts/with-deps.py" comment.py unpacked/ 1 "Reply text" --parent 0 # reply to comment 0
|
||||
python "<skill-dir>/scripts/with-deps.py" comment.py unpacked/ 0 "Text" --author "Custom Author" # custom author name
|
||||
```
|
||||
Then add markers to document.xml (see Comments in XML Reference).
|
||||
|
||||
### Step 3: Pack
|
||||
```bash
|
||||
python scripts/office/pack.py unpacked/ output.docx --original document.docx
|
||||
python "<skill-dir>/scripts/with-deps.py" office/pack.py unpacked/ output.docx --original document.docx
|
||||
```
|
||||
Validates with auto-repair, condenses XML, and creates DOCX. Use `--validate false` to skip.
|
||||
|
||||
@@ -541,7 +553,13 @@ After running `comment.py` (see Step 2), add markers to document.xml. For replie
|
||||
|
||||
## Dependencies
|
||||
|
||||
- **pandoc**: Text extraction
|
||||
- **docx**: `npm install -g docx` (new documents)
|
||||
- **LibreOffice**: PDF conversion (auto-configured for sandboxed environments via `scripts/office/soffice.py`)
|
||||
- **Poppler**: `pdftoppm` for images
|
||||
运行时预装(免安装、离线、跨平台 —— 下列启动器都不需要 `bash`):
|
||||
|
||||
- **docx** (docx-js): 新建文档 —— **由运行时预装**(`runtime-deps/node_modules`);生成器通过 `node -r "<skill-dir>/scripts/preload-deps.cjs" generate.js` 运行,无需 `npm install`。
|
||||
- **defusedxml** + **lxml**: XML 解析与完整 XSD 校验 —— **由运行时预装**在自带 Python 中(`runtime-deps/python-runtime`);脚本通过 `python "<skill-dir>/scripts/with-deps.py" <脚本> ...` 运行即可离线使用。自带 Python / `lxml` 不可用时回退系统 `python3`,并优雅跳过 XSD 校验。
|
||||
|
||||
外部系统工具(仅读取/转换需要,**生成不需要**;如使用请按平台安装):
|
||||
|
||||
- **pandoc**: 文本提取。macOS `brew install pandoc` · Windows `winget install --id JohnMacFarlane.Pandoc` · Linux `sudo apt install pandoc`
|
||||
- **LibreOffice**(`soffice`): `.doc`→`.docx`、PDF 转换、接受修订。macOS `brew install --cask libreoffice` · Windows `winget install --id TheDocumentFoundation.LibreOffice`(确保 `soffice` 在 `PATH`)· Linux `sudo apt install libreoffice`。`scripts/office/soffice.py` 里的 Linux 沙箱 `AF_UNIX` shim 在 macOS/Windows 上会自动跳过。
|
||||
- **Poppler**(`pdftoppm`,页→图片): macOS `brew install poppler` · Windows `winget install --id oschwartz10612.Poppler` 或 `choco install poppler` · Linux `sudo apt install poppler-utils`
|
||||
|
||||
Reference in New Issue
Block a user