mirror of
https://git.openapi.site/https://github.com/desirecore/market.git
synced 2026-06-06 05:50:41 +08:00
feat(docx): 跨平台启动器替换 bash 包装,复用预装依赖免每次安装 (#21)
## 概述 / Summary 把 docx 技能对"客户端预装运行时依赖"的复用方式从 **bash 包装脚本**改为**跨平台 runtime 启动器**,实现 Win/macOS/Linux 一致、不依赖 Git Bash,并修复若干 POSIX 硬编码导致的 Windows 崩溃点。 Switch the docx skill's reuse of client-preinstalled runtime deps from a **bash wrapper** to **cross-platform runtime launchers**, so it behaves identically on Win/macOS/Linux without Git Bash, and fix several POSIX-hardcoded crashes on Windows. ## 改动 / Changes - **新增 / Add** `scripts/preload-deps.cjs`(Node 预加载,注入 `NODE_PATH`)与 `scripts/with-deps.py`(Python 启动器,按需切换到内置含 lxml 的 Python);**删除** bash 版 `with-deps.sh`。 - 生成走 `node -r preload-deps.cjs`,office 脚本走 `python with-deps.py` —— 离线复用预装的 docx-js / defusedxml / lxml,免每次 `npm`/`pip install`,且**不依赖 bash**。 - `comment.py` 补 defusedxml sys.path shim;`validate.py` 修临时目录泄漏(atexit 清理)。 - `accept_changes.py` 去除 `/tmp` 硬编码(`tempfile.gettempdir` + `Path.as_uri`);`soffice.py` 仅 Linux 启用 AF_UNIX shim,避免 Windows 崩溃。 - `SKILL.md` / `SKILL.zh-CN.md` 同步命令形式、加 ESM 警告与外部工具(pandoc/LibreOffice/poppler)跨平台安装指引,`source_hash` 重算。 ## 测试 / Testing - 真实 dev 根目录端到端:生成 docx(免安装)+ 完整 XSD 校验(含 lxml)+ unpack/pack 往返均通过。 - 仓库 `validate-i18n.py` 校验通过;全 py 脚本 `py_compile` + `preload-deps.cjs` `node --check` 通过。 --- - [x] 我已阅读并同意 CLA / I have read and agree to the CLA Co-authored-by: 张馨元 <zhangxy@iynss.com> Co-authored-by: Yige <a@wyr.me>
This commit is contained in:
@@ -13,7 +13,7 @@ description: >-
|
||||
PDFs, spreadsheets, Google Docs, or general coding tasks unrelated to document
|
||||
generation. Use when 用户提到 Word文档、docx、创建文档、编辑文档、报告、
|
||||
备忘录、公文、合同、信函模板。
|
||||
version: 1.0.2
|
||||
version: 1.0.3
|
||||
type: procedural
|
||||
risk_level: low
|
||||
status: enabled
|
||||
@@ -25,7 +25,7 @@ tags:
|
||||
- office
|
||||
metadata:
|
||||
author: anthropic
|
||||
updated_at: '2026-04-13'
|
||||
updated_at: '2026-06-02'
|
||||
i18n:
|
||||
default_locale: en-US
|
||||
source_locale: zh-CN
|
||||
@@ -38,7 +38,7 @@ metadata:
|
||||
description: >-
|
||||
Use this skill whenever the user wants to create, read, edit, or manipulate Word documents (.docx files). Triggers include: any mention of "Word doc", "word document", ".docx", or requests to produce professional documents with formatting like tables of contents, headings, page numbers, or letterheads. Also use when extracting or reorganizing content from .docx files, inserting or replacing images in documents, performing find-and-replace in Word files, working with tracked changes or comments, or converting content into a polished Word document. If the user asks for a "report", "memo", "letter", "template", or similar deliverable as a Word or .docx file, use this skill. Do NOT use for PDFs, spreadsheets, Google Docs, or general coding tasks unrelated to document generation. Use when 用户提到 Word文档、docx、创建文档、编辑文档、报告、 备忘录、公文、合同、信函模板。
|
||||
body: ./SKILL.zh-CN.md
|
||||
source_hash: sha256:58d1aae3a57a1851
|
||||
source_hash: sha256:ceb03c49d0215800
|
||||
translated_by: human
|
||||
en-US:
|
||||
name: Word Document Processing
|
||||
@@ -46,7 +46,7 @@ metadata:
|
||||
description: >-
|
||||
Use this skill whenever the user wants to create, read, edit, or manipulate Word documents (.docx files). Triggers include: any mention of "Word doc", "word document", ".docx", or requests to produce professional documents with formatting like tables of contents, headings, page numbers, or letterheads. Also use when extracting or reorganizing content from .docx files, inserting or replacing images in documents, performing find-and-replace in Word files, working with tracked changes or comments, or converting content into a polished Word document. If the user asks for a "report", "memo", "letter", "template", or similar deliverable as a Word or .docx file, use this skill. Do NOT use for PDFs, spreadsheets, Google Docs, or general coding tasks unrelated to document generation. Use when the user mentions Word documents, docx, creating documents, editing documents, reports, memos, official documents, contracts, or letter templates.
|
||||
body: ./SKILL.md
|
||||
source_hash: sha256:58d1aae3a57a1851
|
||||
source_hash: sha256:ceb03c49d0215800
|
||||
translated_by: human
|
||||
market:
|
||||
icon: >-
|
||||
@@ -93,14 +93,16 @@ docx is a **Procedural Skill** that provides full processing capabilities for Wo
|
||||
|
||||
The Python scripts bundled with this skill live inside the skill installation directory. You **MUST use full paths** when invoking them — never use bare relative paths.
|
||||
|
||||
The skill directory is provided by the `<skill-dir>` tag in the context. Prefix all `scripts/` commands accordingly:
|
||||
The skill directory is provided by the `<skill-dir>` tag in the context. **Always run the office Python scripts through the cross-platform launcher** `scripts/with-deps.py`, so they reuse the runtime-preinstalled libraries (`defusedxml`, and `lxml` for full validation) with no `pip install`. The launcher is pure Python and behaves identically on **macOS / Linux / Windows — it does NOT require `bash`**:
|
||||
|
||||
```bash
|
||||
python "<skill-dir>/scripts/office/unpack.py" document.docx unpacked/
|
||||
python "<skill-dir>/scripts/office/pack.py" unpacked/ output.docx
|
||||
python "<skill-dir>/scripts/with-deps.py" office/unpack.py document.docx unpacked/
|
||||
python "<skill-dir>/scripts/with-deps.py" office/pack.py unpacked/ output.docx
|
||||
```
|
||||
|
||||
**NEVER** run `python scripts/office/unpack.py` directly — that relative path does not exist in the user's working directory.
|
||||
The launcher runs the target script under a runtime-bundled Python that has `lxml`/`defusedxml` preinstalled, and falls back to the system `python3` if that bundled Python is unavailable (in which case full XSD validation is skipped gracefully). The script path is **relative to `scripts/`** (e.g. `office/unpack.py`, `comment.py`).
|
||||
|
||||
**NEVER** invoke an office script with a bare relative path like `python scripts/office/unpack.py` — that path does not exist in the user's working directory, and it bypasses the preinstalled libraries. Always go through `<skill-dir>/scripts/with-deps.py`.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
@@ -124,17 +126,9 @@ for everything else (containers / WSL / system tools), load the `dev-environment
|
||||
|
||||
### Python Package Dependencies
|
||||
|
||||
The Python scripts in this Skill depend on the following packages (checked on demand, only when the relevant scripts are actually invoked):
|
||||
The Python scripts in this Skill depend on `defusedxml` (XML parsing) and `lxml` (XSD validation). **Both are pre-installed by the runtime** — when you run the scripts through `scripts/with-deps.py`, the target runs under a runtime-bundled Python that has both, so **no `pip install` and no network are needed** (works offline, on every platform, without `bash`).
|
||||
|
||||
- `lxml` — XML schema validation (validate.py)
|
||||
- `defusedxml` — safe XML parsing (unpack.py)
|
||||
|
||||
Detection method:
|
||||
```bash
|
||||
python3 -c "import lxml; import defusedxml" 2>/dev/null || echo "MISSING"
|
||||
```
|
||||
|
||||
If missing, instruct the user to install: `pip install lxml defusedxml`
|
||||
Fallback: if the runtime-bundled Python is unavailable (older client / build without it), the wrapper uses system `python3`. In that case `defusedxml` still resolves (pure-Python, bundled separately) but `lxml` may be missing — full XSD validation is then **skipped gracefully** (editing/packing still succeed). To enable full validation in that fallback case: `pip install lxml`.
|
||||
|
||||
## Output Rule
|
||||
|
||||
@@ -157,7 +151,7 @@ A .docx file is a ZIP archive containing XML files.
|
||||
Legacy `.doc` files must be converted before editing:
|
||||
|
||||
```bash
|
||||
python scripts/office/soffice.py --headless --convert-to docx document.doc
|
||||
python "<skill-dir>/scripts/with-deps.py" office/soffice.py --headless --convert-to docx document.doc
|
||||
```
|
||||
|
||||
### Reading Content
|
||||
@@ -167,13 +161,13 @@ python scripts/office/soffice.py --headless --convert-to docx document.doc
|
||||
pandoc --track-changes=all document.docx -o output.md
|
||||
|
||||
# Raw XML access
|
||||
python scripts/office/unpack.py document.docx unpacked/
|
||||
python "<skill-dir>/scripts/with-deps.py" office/unpack.py document.docx unpacked/
|
||||
```
|
||||
|
||||
### Converting to Images
|
||||
|
||||
```bash
|
||||
python scripts/office/soffice.py --headless --convert-to pdf document.docx
|
||||
python "<skill-dir>/scripts/with-deps.py" office/soffice.py --headless --convert-to pdf document.docx
|
||||
pdftoppm -jpeg -r 150 document.pdf page
|
||||
```
|
||||
|
||||
@@ -182,17 +176,28 @@ pdftoppm -jpeg -r 150 document.pdf page
|
||||
To produce a clean document with all tracked changes accepted (requires LibreOffice):
|
||||
|
||||
```bash
|
||||
python scripts/accept_changes.py input.docx output.docx
|
||||
python "<skill-dir>/scripts/with-deps.py" accept_changes.py input.docx output.docx
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Creating New Documents
|
||||
|
||||
Generate .docx files with JavaScript, then validate. Install: `npm install -g docx`
|
||||
Generate .docx files with JavaScript, then validate. The `docx` (docx-js) library is **pre-installed by the runtime** — no `npm install` needed. You MUST run the generator through the Node preloader (`scripts/preload-deps.cjs`, see Run below) so `require('docx')` resolves the pre-installed library.
|
||||
|
||||
### Authoring the script (CRITICAL — avoids quote/escaping failures)
|
||||
|
||||
**You MUST create `generate.js` with the `Write` tool — write the file directly.**
|
||||
|
||||
**NEVER** build the script through the shell: do NOT use `bash` heredocs (`cat <<EOF`), `echo`, or `python3 -c "...open(...).write(...)"` to emit JavaScript. Document content (especially manuals/reports) contains many `"` quotes, apostrophes, and CJK punctuation; routing it through the shell causes three layers of quoting to collide (shell quotes × JS string quotes × heredoc delimiter), which corrupts the script and triggers endless retries and command timeouts.
|
||||
|
||||
- Use the `Write` tool → quotes in content are written verbatim, no shell escaping at all.
|
||||
- For long documents, keep content as plain JS strings/arrays inside the file; split into many `Paragraph`s. If a single document is very large, write it in **multiple smaller `Write`/`Edit` steps** rather than one giant command.
|
||||
|
||||
### Setup
|
||||
Write the generator to a file (e.g. `generate.js`) **using the Write tool**:
|
||||
```javascript
|
||||
const fs = require('fs');
|
||||
const { Document, Packer, Paragraph, TextRun, Table, TableRow, TableCell, ImageRun,
|
||||
Header, Footer, AlignmentType, PageOrientation, LevelFormat, ExternalHyperlink,
|
||||
TableOfContents, HeadingLevel, BorderStyle, WidthType, ShadingType,
|
||||
@@ -202,10 +207,17 @@ const doc = new Document({ sections: [{ children: [/* content */] }] });
|
||||
Packer.toBuffer(doc).then(buffer => fs.writeFileSync("doc.docx", buffer));
|
||||
```
|
||||
|
||||
### Run
|
||||
Run it with the cross-platform Node preloader (it injects `NODE_PATH` for the pre-installed `docx`; pure Node, works on **macOS / Linux / Windows**, no `bash` needed):
|
||||
```bash
|
||||
node -r "<skill-dir>/scripts/preload-deps.cjs" generate.js
|
||||
```
|
||||
**Use CommonJS `require('docx')`, NOT ESM `import`** — the pre-installed library is resolved via `NODE_PATH`, which Node **ignores for ESM**. Do not put a `"type": "module"` `package.json` in the working directory either, as it would force `.js` files to be treated as ESM and break `require`. If `require('docx')` still fails (e.g. the runtime pre-install is unavailable on an older client), fall back to `npm install -g docx` and re-run.
|
||||
|
||||
### Validation
|
||||
After creating the file, validate it. If validation fails, unpack, fix the XML, and repack.
|
||||
```bash
|
||||
python scripts/office/validate.py doc.docx
|
||||
python "<skill-dir>/scripts/with-deps.py" office/validate.py doc.docx
|
||||
```
|
||||
|
||||
### Page Size
|
||||
@@ -426,7 +438,7 @@ sections: [{
|
||||
|
||||
### Step 1: Unpack
|
||||
```bash
|
||||
python scripts/office/unpack.py document.docx unpacked/
|
||||
python "<skill-dir>/scripts/with-deps.py" office/unpack.py document.docx unpacked/
|
||||
```
|
||||
Extracts XML, pretty-prints, merges adjacent runs, and converts smart quotes to XML entities (`“` etc.) so they survive editing. Use `--merge-runs false` to skip run merging.
|
||||
|
||||
@@ -452,15 +464,15 @@ Edit files in `unpacked/word/`. See XML Reference below for patterns.
|
||||
|
||||
**Adding comments:** Use `comment.py` to handle boilerplate across multiple XML files (text must be pre-escaped XML):
|
||||
```bash
|
||||
python scripts/comment.py unpacked/ 0 "Comment text with & and ’"
|
||||
python scripts/comment.py unpacked/ 1 "Reply text" --parent 0 # reply to comment 0
|
||||
python scripts/comment.py unpacked/ 0 "Text" --author "Custom Author" # custom author name
|
||||
python "<skill-dir>/scripts/with-deps.py" comment.py unpacked/ 0 "Comment text with & and ’"
|
||||
python "<skill-dir>/scripts/with-deps.py" comment.py unpacked/ 1 "Reply text" --parent 0 # reply to comment 0
|
||||
python "<skill-dir>/scripts/with-deps.py" comment.py unpacked/ 0 "Text" --author "Custom Author" # custom author name
|
||||
```
|
||||
Then add markers to document.xml (see Comments in XML Reference).
|
||||
|
||||
### Step 3: Pack
|
||||
```bash
|
||||
python scripts/office/pack.py unpacked/ output.docx --original document.docx
|
||||
python "<skill-dir>/scripts/with-deps.py" office/pack.py unpacked/ output.docx --original document.docx
|
||||
```
|
||||
Validates with auto-repair, condenses XML, and creates DOCX. Use `--validate false` to skip.
|
||||
|
||||
@@ -609,7 +621,13 @@ After running `comment.py` (see Step 2), add markers to document.xml. For replie
|
||||
|
||||
## Dependencies
|
||||
|
||||
- **pandoc**: Text extraction
|
||||
- **docx**: `npm install -g docx` (new documents)
|
||||
- **LibreOffice**: PDF conversion (auto-configured for sandboxed environments via `scripts/office/soffice.py`)
|
||||
- **Poppler**: `pdftoppm` for images
|
||||
Runtime-preinstalled (no install, offline, cross-platform — the launchers below need no `bash`):
|
||||
|
||||
- **docx** (docx-js): new documents — **pre-installed by the runtime** (`runtime-deps/node_modules`); run generators via `node -r "<skill-dir>/scripts/preload-deps.cjs" generate.js`. No `npm install` needed.
|
||||
- **defusedxml** + **lxml**: XML parsing & full XSD validation — **pre-installed by the runtime** in a bundled Python (`runtime-deps/python-runtime`); run scripts via `python "<skill-dir>/scripts/with-deps.py" <script> ...` to use them offline. Falls back to the system `python3` and skips XSD validation gracefully if the bundled Python / `lxml` is unavailable.
|
||||
|
||||
External system tools (only needed for reading/conversion, **not** for generation; install per OS if used):
|
||||
|
||||
- **pandoc**: Text extraction. macOS `brew install pandoc` · Windows `winget install --id JohnMacFarlane.Pandoc` · Linux `sudo apt install pandoc`
|
||||
- **LibreOffice** (`soffice`): `.doc`→`.docx` and PDF conversion, accept-changes. macOS `brew install --cask libreoffice` · Windows `winget install --id TheDocumentFoundation.LibreOffice` (ensure `soffice` is on `PATH`) · Linux `sudo apt install libreoffice`. The Linux-sandbox `AF_UNIX` shim in `scripts/office/soffice.py` is auto-skipped on macOS/Windows.
|
||||
- **Poppler** (`pdftoppm`, page→image): macOS `brew install poppler` · Windows `winget install --id oschwartz10612.Poppler` or `choco install poppler` · Linux `sudo apt install poppler-utils`
|
||||
|
||||
@@ -25,14 +25,16 @@ docx 是一个**流程型技能(Procedural Skill)**,提供 Word 文档的
|
||||
|
||||
本技能自带的 Python 脚本位于技能安装目录内。执行时**必须使用完整路径**,不能使用相对路径。
|
||||
|
||||
技能目录由上下文中的 `<skill-dir>` 标签提供。所有 `scripts/` 开头的命令都应拼接为:
|
||||
技能目录由上下文中的 `<skill-dir>` 标签提供。**所有 office Python 脚本都必须通过跨平台启动器** `scripts/with-deps.py` 运行,从而复用运行时预装的库(`defusedxml`,以及完整校验用的 `lxml`),无需 `pip install`。该启动器是纯 Python,在 **macOS / Linux / Windows 上行为一致——不依赖 `bash`**:
|
||||
|
||||
```bash
|
||||
python "<skill-dir>/scripts/office/unpack.py" document.docx unpacked/
|
||||
python "<skill-dir>/scripts/office/pack.py" unpacked/ output.docx
|
||||
python "<skill-dir>/scripts/with-deps.py" office/unpack.py document.docx unpacked/
|
||||
python "<skill-dir>/scripts/with-deps.py" office/pack.py unpacked/ output.docx
|
||||
```
|
||||
|
||||
**禁止**直接执行 `python scripts/office/unpack.py`——该相对路径在用户工作目录下不存在。
|
||||
启动器会把目标脚本放到运行时自带、预装了 `lxml`/`defusedxml` 的 Python 下运行(若该自带 Python 不可用则回退系统 `python3`,此时完整 XSD 校验会被优雅跳过)。脚本路径**相对 `scripts/`**(如 `office/unpack.py`、`comment.py`)。
|
||||
|
||||
**禁止**用裸相对路径执行 office 脚本(如 `python scripts/office/unpack.py`)——该路径在用户工作目录下不存在,且会绕过预装的库。一律通过 `<skill-dir>/scripts/with-deps.py` 运行。
|
||||
|
||||
## Prerequisites
|
||||
|
||||
@@ -56,17 +58,9 @@ python3 --version 2>/dev/null || python --version 2>/dev/null
|
||||
|
||||
### Python 包依赖
|
||||
|
||||
本技能的 Python 脚本依赖以下包(按需检测,仅在实际调用相关脚本时检查):
|
||||
本技能的 Python 脚本依赖 `defusedxml`(XML 解析)和 `lxml`(XSD 校验)。**两者均由运行时预装** —— 通过 `scripts/with-deps.py` 运行脚本时,目标会在运行时自带、已装好两者的 Python 下运行,因此**无需 `pip install`、无需联网**(离线可用,全平台、不依赖 `bash`)。
|
||||
|
||||
- `lxml` — XML schema 验证(validate.py)
|
||||
- `defusedxml` — 安全 XML 解析(unpack.py)
|
||||
|
||||
检测方法:
|
||||
```bash
|
||||
python3 -c "import lxml; import defusedxml" 2>/dev/null || echo "MISSING"
|
||||
```
|
||||
|
||||
缺失时告知用户安装:`pip install lxml defusedxml`
|
||||
回退:若运行时自带 Python 不可用(老客户端 / 构建未烤入),启动器回退系统 `python3`。此时 `defusedxml` 仍可解析(纯 Python、单独预装),但 `lxml` 可能缺失 —— 完整 XSD 校验会被**优雅跳过**(编辑/打包仍成功)。要在该回退情形下启用完整校验:`pip install lxml`。
|
||||
|
||||
## Output Rule
|
||||
|
||||
@@ -89,7 +83,7 @@ A .docx file is a ZIP archive containing XML files.
|
||||
Legacy `.doc` files must be converted before editing:
|
||||
|
||||
```bash
|
||||
python scripts/office/soffice.py --headless --convert-to docx document.doc
|
||||
python "<skill-dir>/scripts/with-deps.py" office/soffice.py --headless --convert-to docx document.doc
|
||||
```
|
||||
|
||||
### Reading Content
|
||||
@@ -99,13 +93,13 @@ python scripts/office/soffice.py --headless --convert-to docx document.doc
|
||||
pandoc --track-changes=all document.docx -o output.md
|
||||
|
||||
# Raw XML access
|
||||
python scripts/office/unpack.py document.docx unpacked/
|
||||
python "<skill-dir>/scripts/with-deps.py" office/unpack.py document.docx unpacked/
|
||||
```
|
||||
|
||||
### Converting to Images
|
||||
|
||||
```bash
|
||||
python scripts/office/soffice.py --headless --convert-to pdf document.docx
|
||||
python "<skill-dir>/scripts/with-deps.py" office/soffice.py --headless --convert-to pdf document.docx
|
||||
pdftoppm -jpeg -r 150 document.pdf page
|
||||
```
|
||||
|
||||
@@ -114,17 +108,28 @@ pdftoppm -jpeg -r 150 document.pdf page
|
||||
To produce a clean document with all tracked changes accepted (requires LibreOffice):
|
||||
|
||||
```bash
|
||||
python scripts/accept_changes.py input.docx output.docx
|
||||
python "<skill-dir>/scripts/with-deps.py" accept_changes.py input.docx output.docx
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Creating New Documents
|
||||
|
||||
Generate .docx files with JavaScript, then validate. Install: `npm install -g docx`
|
||||
Generate .docx files with JavaScript, then validate. The `docx` (docx-js) library is **pre-installed by the runtime** — no `npm install` needed. You MUST run the generator through the Node preloader (`scripts/preload-deps.cjs`, see Run below) so `require('docx')` resolves the pre-installed library.
|
||||
|
||||
### 编写脚本(关键 —— 避免引号/转义导致失败)
|
||||
|
||||
**必须用 `Write` 工具创建 `generate.js`——直接把文件写出来。**
|
||||
|
||||
**禁止**通过 shell 拼脚本:不要用 `bash` heredoc(`cat <<EOF`)、`echo` 或 `python3 -c "...open(...).write(...)"` 来生成 JavaScript。文档内容(尤其手册/报告)含大量 `"` 引号、撇号、中文标点;经 shell 会导致三层引号互相冲突(shell 引号 × JS 字符串引号 × heredoc 分隔符),脚本被破坏,进而陷入反复重试和命令超时。
|
||||
|
||||
- 用 `Write` 工具 → 内容里的引号原样写入,完全无需 shell 转义。
|
||||
- 长文档把内容作为普通 JS 字符串/数组放进文件,拆成多个 `Paragraph`;单个文档很大时,分**多次较小的 `Write`/`Edit`** 写入,不要塞进一条巨大命令。
|
||||
|
||||
### Setup
|
||||
用 **Write 工具**把生成脚本写成文件(例如 `generate.js`):
|
||||
```javascript
|
||||
const fs = require('fs');
|
||||
const { Document, Packer, Paragraph, TextRun, Table, TableRow, TableCell, ImageRun,
|
||||
Header, Footer, AlignmentType, PageOrientation, LevelFormat, ExternalHyperlink,
|
||||
TableOfContents, HeadingLevel, BorderStyle, WidthType, ShadingType,
|
||||
@@ -134,10 +139,17 @@ const doc = new Document({ sections: [{ children: [/* content */] }] });
|
||||
Packer.toBuffer(doc).then(buffer => fs.writeFileSync("doc.docx", buffer));
|
||||
```
|
||||
|
||||
### Run
|
||||
Run it with the cross-platform Node preloader (it injects `NODE_PATH` for the pre-installed `docx`; pure Node, works on **macOS / Linux / Windows**, no `bash` needed):
|
||||
```bash
|
||||
node -r "<skill-dir>/scripts/preload-deps.cjs" generate.js
|
||||
```
|
||||
**Use CommonJS `require('docx')`, NOT ESM `import`** — the pre-installed library is resolved via `NODE_PATH`, which Node **ignores for ESM**. Do not put a `"type": "module"` `package.json` in the working directory either, as it would force `.js` files to be treated as ESM and break `require`. If `require('docx')` still fails (e.g. the runtime pre-install is unavailable on an older client), fall back to `npm install -g docx` and re-run.
|
||||
|
||||
### Validation
|
||||
After creating the file, validate it. If validation fails, unpack, fix the XML, and repack.
|
||||
```bash
|
||||
python scripts/office/validate.py doc.docx
|
||||
python "<skill-dir>/scripts/with-deps.py" office/validate.py doc.docx
|
||||
```
|
||||
|
||||
### Page Size
|
||||
@@ -358,7 +370,7 @@ sections: [{
|
||||
|
||||
### Step 1: Unpack
|
||||
```bash
|
||||
python scripts/office/unpack.py document.docx unpacked/
|
||||
python "<skill-dir>/scripts/with-deps.py" office/unpack.py document.docx unpacked/
|
||||
```
|
||||
Extracts XML, pretty-prints, merges adjacent runs, and converts smart quotes to XML entities (`“` etc.) so they survive editing. Use `--merge-runs false` to skip run merging.
|
||||
|
||||
@@ -384,15 +396,15 @@ Edit files in `unpacked/word/`. See XML Reference below for patterns.
|
||||
|
||||
**Adding comments:** Use `comment.py` to handle boilerplate across multiple XML files (text must be pre-escaped XML):
|
||||
```bash
|
||||
python scripts/comment.py unpacked/ 0 "Comment text with & and ’"
|
||||
python scripts/comment.py unpacked/ 1 "Reply text" --parent 0 # reply to comment 0
|
||||
python scripts/comment.py unpacked/ 0 "Text" --author "Custom Author" # custom author name
|
||||
python "<skill-dir>/scripts/with-deps.py" comment.py unpacked/ 0 "Comment text with & and ’"
|
||||
python "<skill-dir>/scripts/with-deps.py" comment.py unpacked/ 1 "Reply text" --parent 0 # reply to comment 0
|
||||
python "<skill-dir>/scripts/with-deps.py" comment.py unpacked/ 0 "Text" --author "Custom Author" # custom author name
|
||||
```
|
||||
Then add markers to document.xml (see Comments in XML Reference).
|
||||
|
||||
### Step 3: Pack
|
||||
```bash
|
||||
python scripts/office/pack.py unpacked/ output.docx --original document.docx
|
||||
python "<skill-dir>/scripts/with-deps.py" office/pack.py unpacked/ output.docx --original document.docx
|
||||
```
|
||||
Validates with auto-repair, condenses XML, and creates DOCX. Use `--validate false` to skip.
|
||||
|
||||
@@ -541,7 +553,13 @@ After running `comment.py` (see Step 2), add markers to document.xml. For replie
|
||||
|
||||
## Dependencies
|
||||
|
||||
- **pandoc**: Text extraction
|
||||
- **docx**: `npm install -g docx` (new documents)
|
||||
- **LibreOffice**: PDF conversion (auto-configured for sandboxed environments via `scripts/office/soffice.py`)
|
||||
- **Poppler**: `pdftoppm` for images
|
||||
运行时预装(免安装、离线、跨平台 —— 下列启动器都不需要 `bash`):
|
||||
|
||||
- **docx** (docx-js): 新建文档 —— **由运行时预装**(`runtime-deps/node_modules`);生成器通过 `node -r "<skill-dir>/scripts/preload-deps.cjs" generate.js` 运行,无需 `npm install`。
|
||||
- **defusedxml** + **lxml**: XML 解析与完整 XSD 校验 —— **由运行时预装**在自带 Python 中(`runtime-deps/python-runtime`);脚本通过 `python "<skill-dir>/scripts/with-deps.py" <脚本> ...` 运行即可离线使用。自带 Python / `lxml` 不可用时回退系统 `python3`,并优雅跳过 XSD 校验。
|
||||
|
||||
外部系统工具(仅读取/转换需要,**生成不需要**;如使用请按平台安装):
|
||||
|
||||
- **pandoc**: 文本提取。macOS `brew install pandoc` · Windows `winget install --id JohnMacFarlane.Pandoc` · Linux `sudo apt install pandoc`
|
||||
- **LibreOffice**(`soffice`): `.doc`→`.docx`、PDF 转换、接受修订。macOS `brew install --cask libreoffice` · Windows `winget install --id TheDocumentFoundation.LibreOffice`(确保 `soffice` 在 `PATH`)· Linux `sudo apt install libreoffice`。`scripts/office/soffice.py` 里的 Linux 沙箱 `AF_UNIX` shim 在 macOS/Windows 上会自动跳过。
|
||||
- **Poppler**(`pdftoppm`,页→图片): macOS `brew install poppler` · Windows `winget install --id oschwartz10612.Poppler` 或 `choco install poppler` · Linux `sudo apt install poppler-utils`
|
||||
|
||||
@@ -5,16 +5,21 @@ Requires LibreOffice (soffice) to be installed.
|
||||
|
||||
import argparse
|
||||
import logging
|
||||
import os
|
||||
import shutil
|
||||
import subprocess
|
||||
import tempfile
|
||||
from pathlib import Path
|
||||
|
||||
from office.soffice import get_soffice_env
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
LIBREOFFICE_PROFILE = "/tmp/libreoffice_docx_profile"
|
||||
MACRO_DIR = f"{LIBREOFFICE_PROFILE}/user/basic/Standard"
|
||||
# 跨平台临时目录(Windows 无 /tmp);LibreOffice 的 UserInstallation 需 file:// URI,
|
||||
# Path.as_uri() 在 POSIX/Windows 上分别生成 file:///tmp/... 与 file:///C:/...
|
||||
LIBREOFFICE_PROFILE = os.path.join(tempfile.gettempdir(), "libreoffice_docx_profile")
|
||||
LIBREOFFICE_PROFILE_URI = Path(LIBREOFFICE_PROFILE).as_uri()
|
||||
MACRO_DIR = os.path.join(LIBREOFFICE_PROFILE, "user", "basic", "Standard")
|
||||
|
||||
ACCEPT_CHANGES_MACRO = """<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE script:module PUBLIC "-//OpenOffice.org//DTD OfficeDocument 1.0//EN" "module.dtd">
|
||||
@@ -58,7 +63,7 @@ def accept_changes(
|
||||
cmd = [
|
||||
"soffice",
|
||||
"--headless",
|
||||
f"-env:UserInstallation=file://{LIBREOFFICE_PROFILE}",
|
||||
f"-env:UserInstallation={LIBREOFFICE_PROFILE_URI}",
|
||||
"--norestore",
|
||||
"vnd.sun.star.script:Standard.Module1.AcceptAllTrackedChanges?language=Basic&location=application",
|
||||
str(output_path.absolute()),
|
||||
@@ -100,7 +105,7 @@ def _setup_libreoffice_macro() -> bool:
|
||||
[
|
||||
"soffice",
|
||||
"--headless",
|
||||
f"-env:UserInstallation=file://{LIBREOFFICE_PROFILE}",
|
||||
f"-env:UserInstallation={LIBREOFFICE_PROFILE_URI}",
|
||||
"--terminate_after_init",
|
||||
],
|
||||
capture_output=True,
|
||||
|
||||
@@ -14,12 +14,18 @@ After running, add markers to document.xml:
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import os
|
||||
import random
|
||||
import shutil
|
||||
import sys
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
|
||||
# 复用客户端预装的共享 Python 依赖(defusedxml 等):scripts → docx → skills → <ROOT>
|
||||
_deps = os.path.join(os.path.dirname(os.path.abspath(__file__)), "..", "..", "..", "runtime-deps", "python-libs")
|
||||
if os.path.isdir(_deps):
|
||||
sys.path.insert(0, _deps)
|
||||
|
||||
import defusedxml.minidom
|
||||
|
||||
TEMPLATE_DIR = Path(__file__).parent / "templates"
|
||||
|
||||
@@ -11,15 +11,28 @@ Examples:
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import os
|
||||
import sys
|
||||
import shutil
|
||||
import tempfile
|
||||
import zipfile
|
||||
from pathlib import Path
|
||||
|
||||
# 复用客户端预装的共享 Python 依赖(defusedxml 等):office → scripts → docx → skills → <ROOT>
|
||||
_deps = os.path.join(os.path.dirname(os.path.abspath(__file__)), "..", "..", "..", "..", "runtime-deps", "python-libs")
|
||||
if os.path.isdir(_deps):
|
||||
sys.path.insert(0, _deps)
|
||||
|
||||
import defusedxml.minidom
|
||||
|
||||
from validators import DOCXSchemaValidator, PPTXSchemaValidator, RedliningValidator
|
||||
try:
|
||||
from validators import DOCXSchemaValidator, PPTXSchemaValidator, RedliningValidator
|
||||
_VALIDATORS_AVAILABLE = True
|
||||
except ImportError as _e:
|
||||
# validators 依赖 lxml(编译型扩展,未随客户端预装)。缺失时优雅降级:
|
||||
# 跳过 XSD 完整校验而非崩溃——打包/编辑本身不需要 lxml。
|
||||
_VALIDATORS_AVAILABLE = False
|
||||
_VALIDATORS_IMPORT_ERROR = _e
|
||||
|
||||
def pack(
|
||||
input_directory: str,
|
||||
@@ -39,6 +52,13 @@ def pack(
|
||||
return None, f"Error: {output_file} must be a .docx, .pptx, or .xlsx file"
|
||||
|
||||
if validate and original_file:
|
||||
if not _VALIDATORS_AVAILABLE:
|
||||
print(
|
||||
"Warning: lxml 未安装,已跳过 XSD 完整校验(文件仍正常打包)。"
|
||||
"如需完整校验请安装 lxml:pip install lxml",
|
||||
file=sys.stderr,
|
||||
)
|
||||
else:
|
||||
original_path = Path(original_file)
|
||||
if original_path.exists():
|
||||
success, output = _run_validation(
|
||||
|
||||
@@ -17,6 +17,7 @@ Usage:
|
||||
import os
|
||||
import socket
|
||||
import subprocess
|
||||
import sys
|
||||
import tempfile
|
||||
from pathlib import Path
|
||||
|
||||
@@ -42,11 +43,16 @@ _SHIM_SO = Path(tempfile.gettempdir()) / "lo_socket_shim.so"
|
||||
|
||||
|
||||
def _needs_shim() -> bool:
|
||||
# AF_UNIX socket 屏蔽的 LD_PRELOAD + gcc(.so) 兜底仅对 Linux 沙箱有意义;
|
||||
# macOS/Windows 上既无 LD_PRELOAD 机制也无该限制,且 socket.AF_UNIX 在部分
|
||||
# Windows Python 上不存在(AttributeError),直接判定为不需要 shim。
|
||||
if sys.platform != "linux":
|
||||
return False
|
||||
try:
|
||||
s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
|
||||
s.close()
|
||||
return False
|
||||
except OSError:
|
||||
except (OSError, AttributeError):
|
||||
return True
|
||||
|
||||
|
||||
|
||||
@@ -14,10 +14,16 @@ Examples:
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import os
|
||||
import sys
|
||||
import zipfile
|
||||
from pathlib import Path
|
||||
|
||||
# 复用客户端预装的共享 Python 依赖(defusedxml 等):office → scripts → docx → skills → <ROOT>
|
||||
_deps = os.path.join(os.path.dirname(os.path.abspath(__file__)), "..", "..", "..", "..", "runtime-deps", "python-libs")
|
||||
if os.path.isdir(_deps):
|
||||
sys.path.insert(0, _deps)
|
||||
|
||||
import defusedxml.minidom
|
||||
|
||||
from helpers.merge_runs import merge_runs as do_merge_runs
|
||||
|
||||
@@ -14,15 +14,36 @@ Auto-repair fixes:
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import atexit
|
||||
import os
|
||||
import shutil
|
||||
import sys
|
||||
import tempfile
|
||||
import zipfile
|
||||
from pathlib import Path
|
||||
|
||||
from validators import DOCXSchemaValidator, PPTXSchemaValidator, RedliningValidator
|
||||
# 复用客户端预装的共享 Python 依赖(defusedxml 等):office → scripts → docx → skills → <ROOT>
|
||||
_deps = os.path.join(os.path.dirname(os.path.abspath(__file__)), "..", "..", "..", "..", "runtime-deps", "python-libs")
|
||||
if os.path.isdir(_deps):
|
||||
sys.path.insert(0, _deps)
|
||||
|
||||
try:
|
||||
from validators import DOCXSchemaValidator, PPTXSchemaValidator, RedliningValidator
|
||||
_VALIDATORS_AVAILABLE = True
|
||||
except ImportError:
|
||||
# validators 依赖 lxml(编译型扩展,未随客户端预装)。缺失时优雅降级:
|
||||
# 提示安装并以成功退出,而非崩溃——避免阻塞依赖本脚本的上层流程。
|
||||
_VALIDATORS_AVAILABLE = False
|
||||
|
||||
|
||||
def main():
|
||||
if not _VALIDATORS_AVAILABLE:
|
||||
print(
|
||||
"Warning: lxml 未安装,已跳过 XSD 完整校验。如需完整校验请安装 lxml:pip install lxml",
|
||||
file=sys.stderr,
|
||||
)
|
||||
return 0
|
||||
|
||||
parser = argparse.ArgumentParser(description="Validate Office document XML files")
|
||||
parser.add_argument(
|
||||
"path",
|
||||
@@ -70,6 +91,8 @@ def main():
|
||||
|
||||
if path.is_file() and path.suffix.lower() in [".docx", ".pptx", ".xlsx"]:
|
||||
temp_dir = tempfile.mkdtemp()
|
||||
# 解包目录在进程退出时清理(含 sys.exit),避免每次校验泄漏 /tmp 目录
|
||||
atexit.register(shutil.rmtree, temp_dir, ignore_errors=True)
|
||||
with zipfile.ZipFile(path, "r") as zf:
|
||||
zf.extractall(temp_dir)
|
||||
unpacked_dir = Path(temp_dir)
|
||||
|
||||
23
skills/docx/scripts/preload-deps.cjs
Normal file
23
skills/docx/scripts/preload-deps.cjs
Normal file
@@ -0,0 +1,23 @@
|
||||
// preload-deps.cjs —— 跨平台 Node 预加载(无需 bash),让 docx 生成复用客户端预装依赖
|
||||
//
|
||||
// 客户端启动时会把 docx-js 预装到 <DESIRECORE_ROOT>/runtime-deps/node_modules/。
|
||||
// 本文件通过 `node -r` 预加载,把该目录注入模块解析路径,使生成脚本里的
|
||||
// require('docx') 无需联网 `npm install` 即可命中预装库。纯 Node 实现,在
|
||||
// Windows / macOS / Linux 上用同一条命令运行,不依赖 bash / Git Bash:
|
||||
//
|
||||
// node -r "<skill-dir>/scripts/preload-deps.cjs" generate.js
|
||||
//
|
||||
// 若预装目录不存在(老客户端 / 离线种子缺失),则不做任何事,由生成脚本自身
|
||||
// 回退(require 失败 → 提示 npm install -g docx)。env 仅作用于本进程。
|
||||
'use strict'
|
||||
const path = require('path')
|
||||
const fs = require('fs')
|
||||
const Module = require('module')
|
||||
|
||||
// scripts → docx → skills → <ROOT>;预装 Node 依赖在 <ROOT>/runtime-deps/node_modules
|
||||
const depsDir = path.resolve(__dirname, '..', '..', '..', 'runtime-deps', 'node_modules')
|
||||
if (fs.existsSync(depsDir)) {
|
||||
// path.delimiter 跨平台自动取 ';'(Windows) / ':'(POSIX)
|
||||
process.env.NODE_PATH = depsDir + (process.env.NODE_PATH ? path.delimiter + process.env.NODE_PATH : '')
|
||||
Module._initPaths() // 让随后运行的 generate.js 的 require('docx') 命中预装库
|
||||
}
|
||||
74
skills/docx/scripts/with-deps.py
Normal file
74
skills/docx/scripts/with-deps.py
Normal file
@@ -0,0 +1,74 @@
|
||||
#!/usr/bin/env python3
|
||||
"""with-deps.py —— 跨平台 Python 启动器(无需 bash,对应已废弃的 with-deps.sh)
|
||||
|
||||
让 office 脚本复用客户端预装的共享依赖,免去运行时 pip install:
|
||||
- defusedxml:纯 Python,注入 PYTHONPATH(<ROOT>/runtime-deps/python-libs)
|
||||
- lxml:编译型扩展,绑定具体解释器 → 若存在受控 Python(<ROOT>/runtime-deps/
|
||||
python-runtime,已装 lxml),用它运行目标脚本,从而离线启用完整 XSD 校验;
|
||||
受控 Python 不存在 / 无法执行(如 macOS 公证拦截)→ 自动退回当前 Python
|
||||
(此时 lxml 缺失,校验会优雅降级跳过,不会崩)。
|
||||
|
||||
纯 Python 实现,在 Windows / macOS / Linux 上用同一条命令运行,不依赖 bash:
|
||||
|
||||
python "<skill-dir>/scripts/with-deps.py" office/unpack.py document.docx unpacked/
|
||||
python "<skill-dir>/scripts/with-deps.py" office/validate.py doc.docx
|
||||
|
||||
目标脚本以 [解释器, 目标, *参数] 直接拉起 —— 等价于 `python <目标>`,因此脚本目录
|
||||
会被 Python 自动加入 sys.path、__name__ == "__main__"、argv 与直接运行完全一致。
|
||||
"""
|
||||
|
||||
import os
|
||||
import subprocess
|
||||
import sys
|
||||
|
||||
_HERE = os.path.dirname(os.path.abspath(__file__)) # .../skills/docx/scripts
|
||||
# scripts → docx → skills → <ROOT>
|
||||
_ROOT = os.path.abspath(os.path.join(_HERE, "..", "..", ".."))
|
||||
_DEPS = os.path.join(_ROOT, "runtime-deps")
|
||||
_PYLIBS = os.path.join(_DEPS, "python-libs")
|
||||
_BUNDLED = os.path.join(
|
||||
_DEPS,
|
||||
"python-runtime",
|
||||
"python.exe" if os.name == "nt" else os.path.join("bin", "python3"),
|
||||
)
|
||||
|
||||
|
||||
def main() -> int:
|
||||
if len(sys.argv) < 2:
|
||||
sys.stderr.write("usage: with-deps.py <script.py> [args...]\n")
|
||||
return 2
|
||||
|
||||
# 目标脚本相对 scripts/ 解析(如 office/validate.py),也支持绝对路径
|
||||
arg = sys.argv[1]
|
||||
target = arg if os.path.isabs(arg) else os.path.join(_HERE, arg)
|
||||
if not os.path.isfile(target):
|
||||
sys.stderr.write(f"with-deps.py: target not found: {target}\n")
|
||||
return 2
|
||||
|
||||
# 选解释器:有受控 Python(含 lxml)且当前不是它 → 用它;否则用当前/系统 Python
|
||||
interp = sys.executable
|
||||
if os.path.isfile(_BUNDLED) and os.path.realpath(_BUNDLED) != os.path.realpath(sys.executable):
|
||||
interp = _BUNDLED
|
||||
|
||||
# 注入 defusedxml(os.pathsep 跨平台自动 ';' / ':')
|
||||
env = dict(os.environ)
|
||||
if os.path.isdir(_PYLIBS):
|
||||
existing = env.get("PYTHONPATH")
|
||||
env["PYTHONPATH"] = _PYLIBS + (os.pathsep + existing if existing else "")
|
||||
|
||||
cmd = [interp, target, *sys.argv[2:]]
|
||||
try:
|
||||
rc = subprocess.run(cmd, env=env).returncode
|
||||
except OSError:
|
||||
rc = 126 # 受控 Python 无法启动
|
||||
|
||||
# 受控 Python 跑不了(rc<0=被信号杀,如 macOS Gatekeeper;126/127=无法执行)
|
||||
# → 退回系统 Python,让脚本在缺 lxml 时优雅降级,而不是把整条命令判为失败
|
||||
if interp != sys.executable and (rc < 0 or rc in (126, 127)):
|
||||
rc = subprocess.run([sys.executable, target, *sys.argv[2:]], env=env).returncode
|
||||
|
||||
return rc
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
Reference in New Issue
Block a user