Skip to content

Conversation

@ShourieG
Copy link
Contributor

@ShourieG ShourieG commented Dec 4, 2025

Type of change

  • Bug

Proposed commit message

mimecast: refactor CEL code to prevent memory issues and OOM failures

The existing CEL implementation in siem_logs data stream was causing
out-of-memory failures during processing. The refactored approach uses
batch processing techniques to reduce memory consumption.

The cloud_integrated_logs data stream received similar optimizations
due to code sharing with siem_logs.

Based on Andrew's POC implementation:
https://github.com/andrewkroh/integrations/commits/mimecast-siem-batch-per-execution/

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.
  • I have verified that any added dashboard complies with Kibana's Dashboard good practices

Note:

Skipped the httpjson test in 'siem_logs' as a fix for the health status degraded issue did not seem
fit for this current PR.

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Screenshots

@ShourieG ShourieG self-assigned this Dec 4, 2025
@ShourieG ShourieG added Integration:mimecast Mimecast (Partner supported) bugfix Pull request that fixes a bug issue Team:Security-Service Integrations Security Service Integrations team [elastic/security-service-integrations] labels Dec 4, 2025
@ShourieG ShourieG marked this pull request as ready for review December 4, 2025 11:45
@ShourieG ShourieG requested a review from a team as a code owner December 4, 2025 11:45
@elasticmachine
Copy link

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

@ShourieG ShourieG requested a review from efd6 December 4, 2025 11:47
@elastic-vault-github-plugin-prod

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

),
},
},
"want_more": false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we are going to get into a non-recoverable state if one of these blobs ages out (Mimecast claims to retain data for 7 days). Assuming the API returns a 404, maybe it should move to the next blob (send an error event, remove the item from cursor.blobs, continue to the next). WDYT?

Copy link
Contributor Author

@ShourieG ShourieG Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just concerned if a 404 is strictly enforced from mimecast's end in such a scenario or not. If a temporary issue causes a 404 then we will be acting upon a false positive.

CEL already retries 5 times by default, what if we increment the waitMin and waitMax to a greater value and then on a non 200, we deterministically remove the faulty blob ? This way we keep a retry fall back and if it does not succeed, we effectively remove the blob from the list permanently and emit a error event, but continue to process the rest of the list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on a non 200, we deterministically remove the faulty blob

I disagree with the non-200. I think that is too wide of a condition. I would limit it to HTTP 404 to begin with because this is the conventional status code for this state, but if we learn more, we can expand the status codes as required (e.g. HTTP 409).

Copy link
Contributor

@efd6 efd6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rémy suggests the following commit message (with my modification)

mimecast: refactor CEL code to prevent memory issues and OOM failures

The existing CEL implementation in siem_logs data stream was causing
out-of-memory failures during processing. The refactored approach uses
batch processing techniques to reduce memory consumption.

The cloud_integrated_logs data stream received similar optimizations
due to code sharing with siem_logs.

Based on Andrew's POC implementation:
https://github.com/andrewkroh/integrations/commits/mimecast-siem-batch-per-execution/

I've replaced code "coupling" with "sharing" since there is no coupling.

Comment on lines 4 to 9
- description: Refactored the CEL program for siem_logs data stream to improve memory usage.
type: bugfix
link: https://github.com/elastic/integrations/pull/16308
- description: Refactored the CEL program for cloud_integrated_logs data stream to improve memory usage.
type: bugfix
link: https://github.com/elastic/integrations/pull/16308
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- description: Refactored the CEL program for siem_logs data stream to improve memory usage.
type: bugfix
link: https://github.com/elastic/integrations/pull/16308
- description: Refactored the CEL program for cloud_integrated_logs data stream to improve memory usage.
type: bugfix
link: https://github.com/elastic/integrations/pull/16308
- description: Refactored the CEL program for cloud_integrated_logs and siem_logs data streams to improve memory usage.
type: bugfix
link: https://github.com/elastic/integrations/pull/16308

"access_token": token.access_token,
"expires": token.expires,
},
"want_more": size(tail(blobs)) != 0 || size(work_list) != 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"want_more": size(tail(blobs)) != 0 || size(work_list) != 0,
"want_more": size(blobs) > 1 || size(work_list) != 0,

in both files

@elasticmachine
Copy link

💚 Build Succeeded

History

cc @ShourieG

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix Pull request that fixes a bug issue Integration:mimecast Mimecast (Partner supported) Team:Security-Service Integrations Security Service Integrations team [elastic/security-service-integrations]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[mimecast.siem_logs] Refactor to reduce memory pressure

4 participants