Skip to content

Releases: harumiWeb/exstruct

v0.3.2

05 Jan 03:49
a08f4ba

Choose a tag to compare

v0.3.2 Release Notes

This release adds merged cell range extraction and output controls, with
pipeline integration and tests to cover the new feature. (PR #35 )

This feature allows LLM to more accurately understand the relationships between cell values.

Highlights

  • MergedCell model and SheetData.merged_cells added to output.
  • Merged cell ranges extracted via openpyxl in standard/verbose modes.
  • Output can exclude merged cells via OutputOptions.filters.include_merged_cells.
  • Pipeline/Backend/Modeling integrations updated with coverage for the new flow.

Compatibility Notes

  • Standard/verbose outputs now include merged_cells by default.
  • Set StructOptions.include_merged_cells=False or
    OutputOptions.filters.include_merged_cells=False to suppress the field.

v0.3.1

28 Dec 10:39
ea60a35

Choose a tag to compare

v0.3.1 Release Notes

This release adds SmartArt extraction and updates shape modeling to better
separate standard shapes, arrows, and SmartArt structures.

Highlights

  • SmartArt extraction via Excel COM, including layout name and nested nodes.
  • Shape modeling split into Shape, Arrow, and SmartArt for clearer semantics.
  • SmartArt node payload optimized for compact output (layout, nodes, kids).

Compatibility Notes

  • Shape.type now exists only on Shape; it is omitted for Arrow and SmartArt.
  • SmartArt output uses layout and nodes with kids; previous fields like
    layout_name, roots, or children are no longer present.

Sample

  • material
    スクリーンショット 2025-12-28 192940

  • ExStruct JSON

    {
      "book_name": "sample_smartart.xlsx",
      "sheets": {
        "Sheet1": {
          "shapes": [
            {
              "id": 1,
              "l": 0,
              "t": 28,
              "kind": "smartart",
              "layout": "基本の循環",
              "nodes": [
                { "text": "1", "kids": [{ "text": "要件定義" }] },
                { "text": "2", "kids": [{ "text": "報連相" }, { "text": "開発" }] },
                {
                  "text": "3",
                  "kids": [{ "text": "実装確認" }, { "text": "動作確認" }]
                },
                { "text": "4", "kids": [{ "text": "対策" }] },
                { "text": "5", "kids": [{ "text": "最終確認" }] }
              ]
            },
            {
              "id": 2,
              "l": 388,
              "t": 32,
              "kind": "smartart",
              "layout": "開始点強調型プロセス",
              "nodes": [
                { "text": "企画" },
                { "text": "執筆" },
                { "text": "編集" },
                { "text": "制作" },
                { "text": "校正" }
              ]
            },
            {
              "id": 3,
              "l": 46,
              "t": 325,
              "kind": "smartart",
              "layout": "組織図",
              "nodes": [
                {
                  "text": "取締役会",
                  "kids": [
                    {
                      "text": "社長",
                      "kids": [
                        { "text": "企画管理部" },
                        {
                          "text": "営業部",
                          "kids": [
                            { "text": "第1営業課" },
                            { "text": "第2営業課" },
                            { "text": "第3営業課" },
                            { "text": "海外営業課" }
                          ]
                        },
                        {
                          "text": "開発部",
                          "kids": [{ "text": "第1開発課" }, { "text": "第2開発課" }]
                        },
                        {
                          "text": "技術部",
                          "kids": [{ "text": "第1技術課" }, { "text": "第2技術課" }]
                        },
                        {
                          "text": "生産部",
                          "kids": [
                            { "text": "愛知工場" },
                            { "text": "山形工場" },
                            { "text": "高知工場" }
                          ]
                        },
                        {
                          "text": "総務部",
                          "kids": [
                            { "text": "総務課" },
                            { "text": "人事課" },
                            { "text": "経理課" }
                          ]
                        }
                      ]
                    }
                  ]
                }
              ]
            }
          ]
        }
      }
    }
  • LLM inference results

    # 📘 sample_smartart.xlsx
    
    ## 1. 基本の循環(SmartArt)
    
    - **1**
      - 要件定義
    - **2**
      - 報連相
      - 開発
    - **3**
      - 実装確認
      - 動作確認
    - **4**
      - 対策
    - **5**
      - 最終確認
    
    ---
    
    ## 2. 開始点強調型プロセス(SmartArt)
    
    1. 企画
    2. 執筆
    3. 編集
    4. 制作
    5. 校正
    
    ```mermaid
    flowchart LR
        B1["企画"] --> B2["執筆"] --> B3["編集"] --> B4["制作"] --> B5["校正"]
    ```
    
    ---
    
    ## 3. 組織図(SmartArt)
    
    - **取締役会**
      - **社長**
        - 企画管理部
        - 営業部
          - 第 1 営業課
          - 第 2 営業課
          - 第 3 営業課
          - 海外営業課
        - 開発部
          - 第 1 開発課
          - 第 2 開発課
        - 技術部
          - 第 1 技術課
          - 第 2 技術課
        - 生産部
          - 愛知工場
          - 山形工場
          - 高知工場
        - 総務部
          - 総務課
          - 人事課
          - 経理課
    
    ```mermaid
    flowchart TB
        T["取締役会"]
        P["社長"]
    
        T --> P
    
        P --> K1["企画管理部"]
    
        P --> E["営業部"]
            E --> E1["第1営業課"]
            E --> E2["第2営業課"]
            E --> E3["第3営業課"]
            E --> E4["海外営業課"]
    
        P --> D["開発部"]
            D --> D1["第1開発課"]
            D --> D2["第2開発課"]
    
        P --> G["技術部"]
            G --> G1["第1技術課"]
            G --> G2["第2技術課"]
    
        P --> S["生産部"]
            S --> S1["愛知工場"]
            S --> S2["山形工場"]
            S --> S3["高知工場"]
    
        P --> A["総務部"]
            A --> A1["総務課"]
            A --> A2["人事課"]
            A --> A3["経理課"]
    ```

v0.3.0

27 Dec 14:49
9b3478b

Choose a tag to compare

v0.3.0 Release Notes

This release delivers a large internal refactor (PR #23) to improve
maintainability and prepare the codebase for future features. There are no
intended user-facing API changes in this release.

Highlights

  • Internal processing pipeline refactored for clearer responsibilities and
    easier extension.
  • Code organization and structure improved to support ongoing maintenance.

Compatibility Notes

  • No expected behavioral or API changes compared to v0.2.90.

v0.2.90 Release

24 Dec 07:16
f608a46

Choose a tag to compare

v0.2.90 Release Notes

This release adds cell background color extraction via colors_map, including
conditional formatting support when running with Excel COM.

Highlights

  • colors_map: SheetData now includes background color locations keyed by hex
    color strings.
  • COM display colors: in COM environments, extraction reads
    DisplayFormat.Interior to capture conditional formatting colors.
  • ColorsOptions: configure color extraction with include_default_background
    and ignore_colors (normalized hex keys).
  • Verbose mode: mode="verbose" now enables colors_map extraction by default.
  • Tests: added coverage for default background exclusion and ignore lists.

Compatibility Notes

  • Non-COM environments still extract colors via openpyxl, without conditional
    formatting evaluation.
  • ignore_colors is applied during extraction, reducing payload size early.

Gantt chart inference sample

スクリーンショット 2025-12-24 144259
# プロジェクトスケジュール

## 基本情報

| 項目                 | 内容                                                                                                   |
| -------------------- | ------------------------------------------------------------------------------------------------------ |
| プロジェクトタイトル | (未入力)                                                                                             |
| 会社名               | (未入力)                                                                                             |
| プロジェクト主任     | (未入力)                                                                                             |
| プロジェクト開始日   | 2025-12-24                                                                                             |
| テンプレート         | [シンプル ガント チャート (Vertex42)](https://www.vertex42.com/ExcelTemplates/simple-gantt-chart.html) |

---

## スケジュール概要

- 表示期間:**2025-12-222026-02-15**
- 週単位で 8 週間表示
- 各タスクは開始日・終了日に基づき自動でガント表示
- 進捗率はセル内バーおよび網掛けで表現

---

## フェーズ別タスク一覧

---

### フェーズ 1:企画・要件

| タスク                         | 担当 | 進捗 | 開始       | 終了       | 日数 |
| ------------------------------ | ---- | ---: | ---------- | ---------- | ---: |
| 目的と成功指標の整理           | 名前 |  50% | 2025-12-24 | 2025-12-27 |    4 |
| ステークホルダー要件ヒアリング |      |  60% | 2025-12-27 | 2025-12-29 |    3 |
| 現場業務フロー整理             |      |  50% | 2025-12-29 | 2026-01-02 |    5 |
| スコープ定義と除外事項確定     |      |  25% | 2026-01-02 | 2026-01-07 |    6 |
| 要件レビューと合意形成         |      |      | 2025-12-28 | 2025-12-30 |    3 |

---

### フェーズ 2:設計・準備

| タスク                 | 進捗 | 開始       | 終了       | 日数 |
| ---------------------- | ---: | ---------- | ---------- | ---: |
| 画面/帳票の基本設計    |  50% | 2025-12-29 | 2026-01-02 |    5 |
| データ項目・マスタ設計 |  50% | 2025-12-31 | 2026-01-05 |    6 |
| インタフェース仕様策定 |      | 2026-01-05 | 2026-01-08 |    4 |
| リスク洗い出しと対策案 |      | 2026-01-05 | 2026-01-07 |    3 |
| 実装計画・見積確定     |      | 2026-01-05 | 2026-01-08 |    4 |

---

### フェーズ 3:実装・検証

| タスク                 | 開始       | 終了       | 日数 |
| ---------------------- | ---------- | ---------- | ---: |
| コア機能実装           | 2026-01-08 | 2026-01-13 |    6 |
| 例外/エッジケース対応  | 2026-01-14 | 2026-01-18 |    5 |
| 単体テスト作成         | 2026-01-19 | 2026-01-24 |    6 |
| 結合テスト準備         | 2026-01-25 | 2026-01-29 |    5 |
| ユーザー受け入れテスト | 2026-01-19 | 2026-01-23 |    5 |

---

### フェーズ 4:リリース・運用

| タスク             | 開始 | 終了 |
| ------------------ | ---- | ---- |
| リリース手順書作成 | 未定 | 未定 |
| 本番データ移行計画 | 未定 | 未定 |
| 監視・ログ設計     | 未定 | 未定 |
| 運用手順 / 教育    | 未定 | 未定 |
| リリース後レビュー | 未定 | 未定 |

---

## ガントチャート(Mermaid)

```mermaid
gantt
    title プロジェクト ガントチャート
    dateFormat  YYYY-MM-DD

    section フェーズ1 企画・要件
    目的と成功指標の整理           :2025-12-24, 4d
    ステークホルダー要件ヒアリング :2025-12-27, 3d
    現場業務フロー整理             :2025-12-29, 5d
    スコープ定義                   :2026-01-02, 6d
    要件レビュー                   :2025-12-28, 3d

    section フェーズ2 設計・準備
    画面/帳票設計                  :2025-12-29, 5d
    データ・マスタ設計             :2025-12-31, 6d
    IF仕様策定                     :2026-01-05, 4d
    リスク対策                     :2026-01-05, 3d
    実装計画・見積                 :2026-01-05, 4d

    section フェーズ3 実装・検証
    コア機能実装                   :2026-01-08, 6d
    例外対応                       :2026-01-14, 5d
    単体テスト                     :2026-01-19, 6d
    結合テスト準備                 :2026-01-25, 5d
    UAT                            :2026-01-19, 5d
```

v0.2.80

21 Dec 12:49
c159b02

Choose a tag to compare

v0.2.80 Release Notes

This release improves shape extraction to better trace flowchart connections by
assigning per-sheet shape ids and linking connector endpoints.

Highlights

  • Shape ids: non-connector shapes now receive sequential id values per sheet
    to enable stable references.
  • Connector linking: connector shapes capture begin_id and end_id resolved
    from connected shapes (via COM ConnectorFormat).
  • Connector metadata: arrow styles, direction, and rotation are recorded for
    arrow/line connectors to enrich flow analysis.
  • Schema updates: JSON schemas and the Shape model include the new connector
    fields.
  • Samples and tests: added connector sample artifacts and expanded tests for
    connector extraction.

Compatibility Notes

  • Non-COM environments continue to omit connector details as before.
  • Shape extraction now includes additional fields in the default output; existing
    fields and semantics remain unchanged.

Thanks

  • @moonmile for improving flowchart connectivity tracking (#15)

v0.2.71

17 Dec 13:25

Choose a tag to compare

v0.2.71 Release Notes

This release adds CLI support for auto page-break exports with environment
gating, plus documentation updates.

Highlights

  • CLI auto page-break export: new --auto-page-breaks-dir option writes per
    auto page-break view files when Excel COM is available.
  • Environment gating: the CLI now detects COM availability and only registers
    COM-specific flags when Excel is usable, preventing unsupported options on
    non-COM platforms.
  • Parser tests: added coverage to ensure the option is visible only when COM is
    available.
  • Documentation updates: CLI usage and API docs now describe the COM-gated
    option.

Compatibility Notes

  • No breaking API changes.
  • On non-COM environments, the --auto-page-breaks-dir flag is hidden and
    cannot be used.

v0.2.70

15 Dec 13:21

Choose a tag to compare

v0.2.70 Release Notes

In tag v0.2.70, we improved the flexibility of file path specification and revised how standard output is handled during export.

Highlights

  • Expanded str support for file paths: Added type definitions and normalization so that str values can be passed directly to engine inputs/outputs and per-sheet/per-area output destinations. Within process, all paths—including those used for PDF/image generation—are normalized to Path for consistent handling. 【F:src/exstruct/engine.py†L96-L240】【F:src/exstruct/engine.py†L592-L643】【F:src/exstruct/core/integrate.py†L290-L375】【F:tests/test_engine.py†L164-L243】
  • Changed behavior to avoid writing to standard output when exporting only secondary outputs: When output_path is not provided and only sheets_dir / print_areas_dir / auto_page_breaks_dir are set, the export focuses solely on these secondary outputs and no longer writes to standard output. 【F:src/exstruct/engine.py†L500-L571】

Compatibility and Migration

  • Existing Path specifications continue to work as-is. Even when str values are passed, they are normalized internally to Path, resulting in identical behavior for both CLI and application usage. 【F:src/exstruct/engine.py†L96-L240】【F:src/exstruct/engine.py†L592-L643】【F:src/exstruct/core/integrate.py†L290-L375】
  • If you only need secondary outputs, leaving output_path=None is sufficient. If you also want output sent to standard output, explicitly specify output_path or stream as before. 【F:src/exstruct/engine.py†L500-L571】

v0.2.61

13 Dec 06:20

Choose a tag to compare

v0.2.61 Release Notes

We introduced a dedicated exception hierarchy to streamline error handling and make failures in output and rendering easier to identify. In addition, the test requirements specification has been reorganized to explicitly define coverage for error handling, CLI behavior, and related areas.

Highlights

  • Dedicated exception hierarchy: Added a common base ExstructError along with purpose-specific exceptions—ConfigError, ExtractionError, SerializationError, MissingDependencyError, RenderError, OutputError, and PrintAreaError—to ensure consistent exception types across processing flows. 【F:src/exstruct/errors.py†L3-L35】
  • Improved error guidance: Unsupported formats now raise SerializationError. Missing dependencies (YAML/TOON/pypdfium2) consistently raise MissingDependencyError with installation guidance. When automatic page-break information is unavailable, export_auto_page_breaks explicitly raises PrintAreaError. Failures during output writing or PDF/image generation are wrapped as OutputError / RenderError. 【F:src/exstruct/init.py†L189-L219】【F:src/exstruct/io/init.py†L284-L437】【F:src/exstruct/io/init.py†L524-L540】【F:src/exstruct/render/init.py†L17-L100】
  • Test requirements reorganization: Updated the test requirements specification to v0.2, enumerating coverage categories and concrete expectations for extraction, output, CLI, and error handling. This includes requirements for MissingDependencyError handling and print area/export_auto_page_breaks, clarifying the checklist to be satisfied prior to release. 【F:docs/agents/TEST_REQUIREMENTS.md†L1-L116】

Compatibility and Migration

  • PrintAreaError also inherits from ValueError, so existing ValueError handlers will continue to catch it; however, handling the new exception type directly makes intent clearer. 【F:src/exstruct/errors.py†L30-L35】
  • Since export_auto_page_breaks now raises an exception when automatic page-break information is unavailable, code that relies on this output should either catch it via try/except or verify in advance that auto_print_areas can be obtained. 【F:src/exstruct/init.py†L189-L219】

v0.2.60

13 Dec 02:37

Choose a tag to compare

v0.2.60 Release Notes

This is the first official tag/release of ExStruct. We are publishing the core functionality for extracting Excel workbooks into structured data in JSON/YAML/TOON formats, making it available through both the CLI and the Python API.

Highlights

  • Excel → Structured data: Collects cells, table candidates, shapes, charts, print areas, automatic page breaks, and hyperlinks on a per-sheet basis, with JSON as the primary output format.
  • Selectable output modes: When a COM environment is not available, light mode extracts cells, table candidates, and print areas. In environments where Excel COM is available, standard/verbose modes additionally output shapes, charts, size information, and hyperlinks.
  • Flexible output formats: In addition to the default JSON output, YAML and TOON formats are supported as options. The --pretty flag enables generation of human-readable JSON.
  • Rendering from the CLI: In Excel-enabled environments, rendering features to generate PDFs and per-sheet PNG images are available via the CLI.
  • Robust fallback behavior: In environments where Excel COM cannot be used, the system falls back to cell and table-candidate extraction to prevent processing from failing due to exceptions.

Compatibility and Installation

  • Installable from PyPI via pip install exstruct.
  • YAML/TOON/rendering features can be enabled by adding optional dependencies (exstruct[yaml], exstruct[toon], exstruct[render], or exstruct[yaml,toon,render]).
  • Rich extraction including shapes and charts targets Windows + Excel (COM via xlwings). In other environments, mode=light is recommended.

Other

  • There are no known breaking changes (initial release).
  • Bug reports and feature requests are welcome via Issues.