Releases: harumiWeb/exstruct
v0.3.2
v0.3.2 Release Notes
This release adds merged cell range extraction and output controls, with
pipeline integration and tests to cover the new feature. (PR #35 )
This feature allows LLM to more accurately understand the relationships between cell values.
Highlights
MergedCellmodel andSheetData.merged_cellsadded to output.- Merged cell ranges extracted via openpyxl in standard/verbose modes.
- Output can exclude merged cells via
OutputOptions.filters.include_merged_cells. - Pipeline/Backend/Modeling integrations updated with coverage for the new flow.
Compatibility Notes
- Standard/verbose outputs now include
merged_cellsby default. - Set
StructOptions.include_merged_cells=Falseor
OutputOptions.filters.include_merged_cells=Falseto suppress the field.
v0.3.1
v0.3.1 Release Notes
This release adds SmartArt extraction and updates shape modeling to better
separate standard shapes, arrows, and SmartArt structures.
Highlights
- SmartArt extraction via Excel COM, including layout name and nested nodes.
- Shape modeling split into
Shape,Arrow, andSmartArtfor clearer semantics. - SmartArt node payload optimized for compact output (
layout,nodes,kids).
Compatibility Notes
Shape.typenow exists only onShape; it is omitted forArrowandSmartArt.- SmartArt output uses
layoutandnodeswithkids; previous fields like
layout_name,roots, orchildrenare no longer present.
Sample
-
ExStruct JSON
{ "book_name": "sample_smartart.xlsx", "sheets": { "Sheet1": { "shapes": [ { "id": 1, "l": 0, "t": 28, "kind": "smartart", "layout": "基本の循環", "nodes": [ { "text": "1", "kids": [{ "text": "要件定義" }] }, { "text": "2", "kids": [{ "text": "報連相" }, { "text": "開発" }] }, { "text": "3", "kids": [{ "text": "実装確認" }, { "text": "動作確認" }] }, { "text": "4", "kids": [{ "text": "対策" }] }, { "text": "5", "kids": [{ "text": "最終確認" }] } ] }, { "id": 2, "l": 388, "t": 32, "kind": "smartart", "layout": "開始点強調型プロセス", "nodes": [ { "text": "企画" }, { "text": "執筆" }, { "text": "編集" }, { "text": "制作" }, { "text": "校正" } ] }, { "id": 3, "l": 46, "t": 325, "kind": "smartart", "layout": "組織図", "nodes": [ { "text": "取締役会", "kids": [ { "text": "社長", "kids": [ { "text": "企画管理部" }, { "text": "営業部", "kids": [ { "text": "第1営業課" }, { "text": "第2営業課" }, { "text": "第3営業課" }, { "text": "海外営業課" } ] }, { "text": "開発部", "kids": [{ "text": "第1開発課" }, { "text": "第2開発課" }] }, { "text": "技術部", "kids": [{ "text": "第1技術課" }, { "text": "第2技術課" }] }, { "text": "生産部", "kids": [ { "text": "愛知工場" }, { "text": "山形工場" }, { "text": "高知工場" } ] }, { "text": "総務部", "kids": [ { "text": "総務課" }, { "text": "人事課" }, { "text": "経理課" } ] } ] } ] } ] } ] } } } -
LLM inference results
# 📘 sample_smartart.xlsx ## 1. 基本の循環(SmartArt) - **1** - 要件定義 - **2** - 報連相 - 開発 - **3** - 実装確認 - 動作確認 - **4** - 対策 - **5** - 最終確認 --- ## 2. 開始点強調型プロセス(SmartArt) 1. 企画 2. 執筆 3. 編集 4. 制作 5. 校正 ```mermaid flowchart LR B1["企画"] --> B2["執筆"] --> B3["編集"] --> B4["制作"] --> B5["校正"] ``` --- ## 3. 組織図(SmartArt) - **取締役会** - **社長** - 企画管理部 - 営業部 - 第 1 営業課 - 第 2 営業課 - 第 3 営業課 - 海外営業課 - 開発部 - 第 1 開発課 - 第 2 開発課 - 技術部 - 第 1 技術課 - 第 2 技術課 - 生産部 - 愛知工場 - 山形工場 - 高知工場 - 総務部 - 総務課 - 人事課 - 経理課 ```mermaid flowchart TB T["取締役会"] P["社長"] T --> P P --> K1["企画管理部"] P --> E["営業部"] E --> E1["第1営業課"] E --> E2["第2営業課"] E --> E3["第3営業課"] E --> E4["海外営業課"] P --> D["開発部"] D --> D1["第1開発課"] D --> D2["第2開発課"] P --> G["技術部"] G --> G1["第1技術課"] G --> G2["第2技術課"] P --> S["生産部"] S --> S1["愛知工場"] S --> S2["山形工場"] S --> S3["高知工場"] P --> A["総務部"] A --> A1["総務課"] A --> A2["人事課"] A --> A3["経理課"] ```
v0.3.0
v0.3.0 Release Notes
This release delivers a large internal refactor (PR #23) to improve
maintainability and prepare the codebase for future features. There are no
intended user-facing API changes in this release.
Highlights
- Internal processing pipeline refactored for clearer responsibilities and
easier extension. - Code organization and structure improved to support ongoing maintenance.
Compatibility Notes
- No expected behavioral or API changes compared to v0.2.90.
v0.2.90 Release
v0.2.90 Release Notes
This release adds cell background color extraction via colors_map, including
conditional formatting support when running with Excel COM.
Highlights
colors_map: SheetData now includes background color locations keyed by hex
color strings.- COM display colors: in COM environments, extraction reads
DisplayFormat.Interiorto capture conditional formatting colors. - ColorsOptions: configure color extraction with
include_default_background
andignore_colors(normalized hex keys). - Verbose mode:
mode="verbose"now enablescolors_mapextraction by default. - Tests: added coverage for default background exclusion and ignore lists.
Compatibility Notes
- Non-COM environments still extract colors via openpyxl, without conditional
formatting evaluation. ignore_colorsis applied during extraction, reducing payload size early.
Gantt chart inference sample
# プロジェクトスケジュール
## 基本情報
| 項目 | 内容 |
| -------------------- | ------------------------------------------------------------------------------------------------------ |
| プロジェクトタイトル | (未入力) |
| 会社名 | (未入力) |
| プロジェクト主任 | (未入力) |
| プロジェクト開始日 | 2025-12-24 |
| テンプレート | [シンプル ガント チャート (Vertex42)](https://www.vertex42.com/ExcelTemplates/simple-gantt-chart.html) |
---
## スケジュール概要
- 表示期間:**2025-12-22 〜 2026-02-15**
- 週単位で 8 週間表示
- 各タスクは開始日・終了日に基づき自動でガント表示
- 進捗率はセル内バーおよび網掛けで表現
---
## フェーズ別タスク一覧
---
### フェーズ 1:企画・要件
| タスク | 担当 | 進捗 | 開始 | 終了 | 日数 |
| ------------------------------ | ---- | ---: | ---------- | ---------- | ---: |
| 目的と成功指標の整理 | 名前 | 50% | 2025-12-24 | 2025-12-27 | 4 |
| ステークホルダー要件ヒアリング | | 60% | 2025-12-27 | 2025-12-29 | 3 |
| 現場業務フロー整理 | | 50% | 2025-12-29 | 2026-01-02 | 5 |
| スコープ定義と除外事項確定 | | 25% | 2026-01-02 | 2026-01-07 | 6 |
| 要件レビューと合意形成 | | | 2025-12-28 | 2025-12-30 | 3 |
---
### フェーズ 2:設計・準備
| タスク | 進捗 | 開始 | 終了 | 日数 |
| ---------------------- | ---: | ---------- | ---------- | ---: |
| 画面/帳票の基本設計 | 50% | 2025-12-29 | 2026-01-02 | 5 |
| データ項目・マスタ設計 | 50% | 2025-12-31 | 2026-01-05 | 6 |
| インタフェース仕様策定 | | 2026-01-05 | 2026-01-08 | 4 |
| リスク洗い出しと対策案 | | 2026-01-05 | 2026-01-07 | 3 |
| 実装計画・見積確定 | | 2026-01-05 | 2026-01-08 | 4 |
---
### フェーズ 3:実装・検証
| タスク | 開始 | 終了 | 日数 |
| ---------------------- | ---------- | ---------- | ---: |
| コア機能実装 | 2026-01-08 | 2026-01-13 | 6 |
| 例外/エッジケース対応 | 2026-01-14 | 2026-01-18 | 5 |
| 単体テスト作成 | 2026-01-19 | 2026-01-24 | 6 |
| 結合テスト準備 | 2026-01-25 | 2026-01-29 | 5 |
| ユーザー受け入れテスト | 2026-01-19 | 2026-01-23 | 5 |
---
### フェーズ 4:リリース・運用
| タスク | 開始 | 終了 |
| ------------------ | ---- | ---- |
| リリース手順書作成 | 未定 | 未定 |
| 本番データ移行計画 | 未定 | 未定 |
| 監視・ログ設計 | 未定 | 未定 |
| 運用手順 / 教育 | 未定 | 未定 |
| リリース後レビュー | 未定 | 未定 |
---
## ガントチャート(Mermaid)
```mermaid
gantt
title プロジェクト ガントチャート
dateFormat YYYY-MM-DD
section フェーズ1 企画・要件
目的と成功指標の整理 :2025-12-24, 4d
ステークホルダー要件ヒアリング :2025-12-27, 3d
現場業務フロー整理 :2025-12-29, 5d
スコープ定義 :2026-01-02, 6d
要件レビュー :2025-12-28, 3d
section フェーズ2 設計・準備
画面/帳票設計 :2025-12-29, 5d
データ・マスタ設計 :2025-12-31, 6d
IF仕様策定 :2026-01-05, 4d
リスク対策 :2026-01-05, 3d
実装計画・見積 :2026-01-05, 4d
section フェーズ3 実装・検証
コア機能実装 :2026-01-08, 6d
例外対応 :2026-01-14, 5d
単体テスト :2026-01-19, 6d
結合テスト準備 :2026-01-25, 5d
UAT :2026-01-19, 5d
```
v0.2.80
v0.2.80 Release Notes
This release improves shape extraction to better trace flowchart connections by
assigning per-sheet shape ids and linking connector endpoints.
Highlights
- Shape ids: non-connector shapes now receive sequential
idvalues per sheet
to enable stable references. - Connector linking: connector shapes capture
begin_idandend_idresolved
from connected shapes (via COM ConnectorFormat). - Connector metadata: arrow styles, direction, and rotation are recorded for
arrow/line connectors to enrich flow analysis. - Schema updates: JSON schemas and the Shape model include the new connector
fields. - Samples and tests: added connector sample artifacts and expanded tests for
connector extraction.
Compatibility Notes
- Non-COM environments continue to omit connector details as before.
- Shape extraction now includes additional fields in the default output; existing
fields and semantics remain unchanged.
Thanks
v0.2.71
v0.2.71 Release Notes
This release adds CLI support for auto page-break exports with environment
gating, plus documentation updates.
Highlights
- CLI auto page-break export: new
--auto-page-breaks-diroption writes per
auto page-break view files when Excel COM is available. - Environment gating: the CLI now detects COM availability and only registers
COM-specific flags when Excel is usable, preventing unsupported options on
non-COM platforms. - Parser tests: added coverage to ensure the option is visible only when COM is
available. - Documentation updates: CLI usage and API docs now describe the COM-gated
option.
Compatibility Notes
- No breaking API changes.
- On non-COM environments, the
--auto-page-breaks-dirflag is hidden and
cannot be used.
v0.2.70
v0.2.70 Release Notes
In tag v0.2.70, we improved the flexibility of file path specification and revised how standard output is handled during export.
Highlights
- Expanded
strsupport for file paths: Added type definitions and normalization so thatstrvalues can be passed directly to engine inputs/outputs and per-sheet/per-area output destinations. Withinprocess, all paths—including those used for PDF/image generation—are normalized toPathfor consistent handling. 【F:src/exstruct/engine.py†L96-L240】【F:src/exstruct/engine.py†L592-L643】【F:src/exstruct/core/integrate.py†L290-L375】【F:tests/test_engine.py†L164-L243】 - Changed behavior to avoid writing to standard output when exporting only secondary outputs: When
output_pathis not provided and onlysheets_dir/print_areas_dir/auto_page_breaks_dirare set, the export focuses solely on these secondary outputs and no longer writes to standard output. 【F:src/exstruct/engine.py†L500-L571】
Compatibility and Migration
- Existing
Pathspecifications continue to work as-is. Even whenstrvalues are passed, they are normalized internally toPath, resulting in identical behavior for both CLI and application usage. 【F:src/exstruct/engine.py†L96-L240】【F:src/exstruct/engine.py†L592-L643】【F:src/exstruct/core/integrate.py†L290-L375】 - If you only need secondary outputs, leaving
output_path=Noneis sufficient. If you also want output sent to standard output, explicitly specifyoutput_pathorstreamas before. 【F:src/exstruct/engine.py†L500-L571】
v0.2.61
v0.2.61 Release Notes
We introduced a dedicated exception hierarchy to streamline error handling and make failures in output and rendering easier to identify. In addition, the test requirements specification has been reorganized to explicitly define coverage for error handling, CLI behavior, and related areas.
Highlights
- Dedicated exception hierarchy: Added a common base
ExstructErroralong with purpose-specific exceptions—ConfigError,ExtractionError,SerializationError,MissingDependencyError,RenderError,OutputError, andPrintAreaError—to ensure consistent exception types across processing flows. 【F:src/exstruct/errors.py†L3-L35】 - Improved error guidance: Unsupported formats now raise
SerializationError. Missing dependencies (YAML/TOON/pypdfium2) consistently raiseMissingDependencyErrorwith installation guidance. When automatic page-break information is unavailable,export_auto_page_breaksexplicitly raisesPrintAreaError. Failures during output writing or PDF/image generation are wrapped asOutputError/RenderError. 【F:src/exstruct/init.py†L189-L219】【F:src/exstruct/io/init.py†L284-L437】【F:src/exstruct/io/init.py†L524-L540】【F:src/exstruct/render/init.py†L17-L100】 - Test requirements reorganization: Updated the test requirements specification to v0.2, enumerating coverage categories and concrete expectations for extraction, output, CLI, and error handling. This includes requirements for
MissingDependencyErrorhandling and print area/export_auto_page_breaks, clarifying the checklist to be satisfied prior to release. 【F:docs/agents/TEST_REQUIREMENTS.md†L1-L116】
Compatibility and Migration
PrintAreaErroralso inherits fromValueError, so existingValueErrorhandlers will continue to catch it; however, handling the new exception type directly makes intent clearer. 【F:src/exstruct/errors.py†L30-L35】- Since
export_auto_page_breaksnow raises an exception when automatic page-break information is unavailable, code that relies on this output should either catch it via try/except or verify in advance thatauto_print_areascan be obtained. 【F:src/exstruct/init.py†L189-L219】
v0.2.60
v0.2.60 Release Notes
This is the first official tag/release of ExStruct. We are publishing the core functionality for extracting Excel workbooks into structured data in JSON/YAML/TOON formats, making it available through both the CLI and the Python API.
Highlights
- Excel → Structured data: Collects cells, table candidates, shapes, charts, print areas, automatic page breaks, and hyperlinks on a per-sheet basis, with JSON as the primary output format.
- Selectable output modes: When a COM environment is not available,
lightmode extracts cells, table candidates, and print areas. In environments where Excel COM is available,standard/verbosemodes additionally output shapes, charts, size information, and hyperlinks. - Flexible output formats: In addition to the default JSON output, YAML and TOON formats are supported as options. The
--prettyflag enables generation of human-readable JSON. - Rendering from the CLI: In Excel-enabled environments, rendering features to generate PDFs and per-sheet PNG images are available via the CLI.
- Robust fallback behavior: In environments where Excel COM cannot be used, the system falls back to cell and table-candidate extraction to prevent processing from failing due to exceptions.
Compatibility and Installation
- Installable from PyPI via
pip install exstruct. - YAML/TOON/rendering features can be enabled by adding optional dependencies (
exstruct[yaml],exstruct[toon],exstruct[render], orexstruct[yaml,toon,render]). - Rich extraction including shapes and charts targets Windows + Excel (COM via xlwings). In other environments,
mode=lightis recommended.
Other
- There are no known breaking changes (initial release).
- Bug reports and feature requests are welcome via Issues.
