Skip to content

PoC to showcase text extraction from IGM documents using VLMs

Notifications You must be signed in to change notification settings

smmehrab/igm-extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IGM Extractor

Proof-of-concept project showcasing the capability of extracting text from IGM (Import General Manifest) documents using Vision Language Models (VLMs).

How It Works

  • Upload an IGM PDF via the web UI.
  • Pages are rendered to images and sent to Gemini for extraction.
  • The app merges per-page results (manifest + containers), applies fixes, and stores data in an embedded H2 database.
  • Review and edit extracted fields, then save.

Getting Started

Prerequisites

  • Java 17+ and Maven

  • You need to enable Gemini API from here (Google Cloud Console).

  • Then, get your API key from here (Google AI Studio).

Build

mvn clean package -DskipTests

You can directly download the .jar file from latest release and run that. (Recommended)

Run

$env:GEMINI_API_KEY = "your_gemini_api_key"
java -jar target\igm-extractor-0.0.1-SNAPSHOT.jar

Or

java -jar target/igm-extractor-0.0.1-SNAPSHOT.jar --gemini.api.key=YOUR_API_KEY

Open http://localhost:8080 to access the app.

Notes

  • H2 database: ./data; uploads: ./uploads.
  • Reset app state: stop and delete ./data.
  • H2 Console: http://localhost:8080/h2-console (JDBC: jdbc:h2:file:./data/igm-extractor, user: sa).

Technologies Used

Core Framework:

  • Spring Boot 3.2.1 (Java 21)
  • Spring Boot Starter Web (REST APIs, MVC)
  • Spring Boot Starter Data JPA (ORM, database access)
  • Spring Boot Starter Thymeleaf (server-side HTML templates)

AI/ML:

  • Google Gemini API (gemini-2.0-flash-exp model via REST API)
  • Spring RestTemplate (HTTP client for Gemini API calls)
  • Jackson (JSON parsing and data binding)

Database:

  • H2 Database (embedded file-based SQL database)

Document Processing:

  • Apache PDFBox 2.0.31 (PDF rendering and image extraction)

Frontend:

  • Thymeleaf templates
  • Bootstrap 5.3 (responsive UI)
  • Bootstrap Icons

Development Tools:

  • Spring Boot DevTools (hot reload, debugging)

Performance

  1. File Save - Time to save the uploaded PDF file to disk and create the IGM record

  2. Render Pages - Time to convert PDF pages to images (PNG format at 150 DPI)

  3. Gemini API Calls - Time for all API calls to Google Gemini (parallel processing for multiple pages)

  4. Parse & Transform - Time to parse JSON responses and transform into entity objects

  5. DB Save - Time to save all manifest and container data to the database

    Step Duration % of Total
    File Save 84 ms 0.4%
    Render Pages 4,326 ms 19.1%
    Gemini API Calls 18,156 ms 80.3%
    Parse & Transform 10 ms 0.0%
    DB Save 28 ms 0.1%
    Total 22,604 ms 100%