Proof-of-concept project showcasing the capability of extracting text from IGM (Import General Manifest) documents using Vision Language Models (VLMs).
- Upload an IGM PDF via the web UI.
- Pages are rendered to images and sent to Gemini for extraction.
- The app merges per-page results (manifest + containers), applies fixes, and stores data in an embedded H2 database.
- Review and edit extracted fields, then save.
-
Java 17+ and Maven
-
You need to enable Gemini API from here (Google Cloud Console).
-
Then, get your API key from here (Google AI Studio).
mvn clean package -DskipTestsYou can directly download the .jar file from latest release and run that. (Recommended)
$env:GEMINI_API_KEY = "your_gemini_api_key"
java -jar target\igm-extractor-0.0.1-SNAPSHOT.jarOr
java -jar target/igm-extractor-0.0.1-SNAPSHOT.jar --gemini.api.key=YOUR_API_KEYOpen http://localhost:8080 to access the app.
- H2 database: ./data; uploads: ./uploads.
- Reset app state: stop and delete ./data.
- H2 Console: http://localhost:8080/h2-console (JDBC: jdbc:h2:file:./data/igm-extractor, user: sa).
Core Framework:
- Spring Boot 3.2.1 (Java 21)
- Spring Boot Starter Web (REST APIs, MVC)
- Spring Boot Starter Data JPA (ORM, database access)
- Spring Boot Starter Thymeleaf (server-side HTML templates)
AI/ML:
- Google Gemini API (gemini-2.0-flash-exp model via REST API)
- Spring RestTemplate (HTTP client for Gemini API calls)
- Jackson (JSON parsing and data binding)
Database:
- H2 Database (embedded file-based SQL database)
Document Processing:
- Apache PDFBox 2.0.31 (PDF rendering and image extraction)
Frontend:
- Thymeleaf templates
- Bootstrap 5.3 (responsive UI)
- Bootstrap Icons
Development Tools:
- Spring Boot DevTools (hot reload, debugging)
-
File Save - Time to save the uploaded PDF file to disk and create the IGM record
-
Render Pages - Time to convert PDF pages to images (PNG format at 150 DPI)
-
Gemini API Calls - Time for all API calls to Google Gemini (parallel processing for multiple pages)
-
Parse & Transform - Time to parse JSON responses and transform into entity objects
-
DB Save - Time to save all manifest and container data to the database
Step Duration % of Total File Save 84 ms 0.4% Render Pages 4,326 ms 19.1% Gemini API Calls 18,156 ms 80.3% Parse & Transform 10 ms 0.0% DB Save 28 ms 0.1% Total 22,604 ms 100%