Skip to content

Conversation

@wadahiro
Copy link
Contributor

@wadahiro wadahiro commented Jan 28, 2026

Summary

This PR addresses OutOfMemoryError during CSV export of large datasets and improves export performance significantly.

Background

Before this fix, CSV export of large datasets had several critical issues:

1. OutOfMemoryError

Loading all data into memory caused OOME with large datasets.

2. PostgreSQL IN clause parameter limit

Export of more than 65,535 records was impossible due to PostgreSQL's prepared statement parameter limit:

Caused by: org.postgresql.util.PSQLException: PreparedStatement can have at most 65,535 parameters.
Please consider using arrays, or splitting the query in several ones, or using COPY.
Given query has 91,362 parameters
  at org.postgresql.jdbc.PgPreparedStatement.<init>(PgPreparedStatement.java:107)
  at com.querydsl.sql.AbstractSQLQuery.fetch(AbstractSQLQuery.java:439)

3. AccessCertificationWorkItem export performance issues

Even after resolving the OOME and IN clause issues, AccessCertificationWorkItem export had severe performance problems. Exporting 5,000 WorkItems took over 8 minutes due to multiple N+1 query problems:

Data structure:

Campaign (1)
  └── Case (5000)
        └── WorkItem (500,000 = 5000 cases × 100 reviewers)

N+1 queries per WorkItem:

  1. Campaign fetch - Each WorkItem triggered a separate Campaign query
  2. Case fetch - Each WorkItem triggered a separate Case query
  3. Reference fetch - Each WorkItem triggered a separate reference query
  4. objectRef/targetRef displayName fetch (GUI layer) - Each row triggered loadObject() to resolve display names

Solution

This fix implements:

  • JDBC cursor-based streaming: Processes rows one by one without loading all OIDs into an IN clause
  • Batch loading with beforeTransformation: Loads Campaign, Case, and references in batches of 100 items using IN clauses
  • displayName caching in ReferenceNameResolver: Caches name/displayName across batches to avoid redundant queries
  • GUI layer optimization: Uses pre-loaded objects from ref.getObject() instead of calling loadObject() for each row

Changes

MID-10990: OutOfMemoryError during CSV Export of Large Datasets

  • fix OutOfMemoryError during CSV export of large datasets: Implement streaming CSV export with IterativeExportSupport and StreamingCsvDataExporter
  • Optimize OperationResult.cleanup() from O(n²) to O(n): Fix performance bottleneck in result cleanup
  • Avoid NoSuchMessageException in LocalizationServiceImpl: Skip unnecessary exception handling for better performance
  • Add JDBC streaming mode support for searchContainersIterative: Enable true JDBC cursor-based streaming with iterationPageSize=-1
  • Add JDBC streaming mode support to searchObjectsIterative: Extend streaming support to Object export
  • Add JDBC streaming support for audit log CSV export: Apply streaming to AuditLogViewer export
  • Use lightweight wrapper for CSV export: Skip expensive child wrapper creation during export
  • Optimize AccessCertificationWorkItem export for large datasets: Implement batch loading with beforeTransformation to eliminate N+1 queries

MID-11047: AccessCertificationWorkItem list unstable display order

  • Add default sort order to WorkItem list: Sort by PK order (ownerOid, accessCertCaseCid, cid) for stable display order

MID-11046: CSV export missing .csv extension

  • Fix CSV export filename missing .csv extension: Ensure .csv extension is appended regardless of user input

Performance Results

AccessCertificationWorkItem Export

Condition Time
5000 WorkItems (5000 cases x 100 reviewers, filtered by 1 reviewer) ~27sec

User Export (Large Dataset)

Records Time File Size
~100,000 users ~20sec ~7MB

@wadahiro wadahiro force-pushed the fix-10990-export-oome branch from 9e53d2e to a58b7e7 Compare January 29, 2026 00:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant