[Mimecast] - Refactored 'siem_logs' & 'cloud_integrated_logs' data streams to improve memory usage #16308

ShourieG · 2025-12-04T11:43:00Z

Type of change

Bug

Proposed commit message

mimecast: refactor CEL code to prevent memory issues and OOM failures

The existing CEL implementation in siem_logs data stream was causing
out-of-memory failures during processing. The refactored approach uses
batch processing techniques to reduce memory consumption.

The cloud_integrated_logs data stream received similar optimizations
due to code sharing with siem_logs.

Based on Andrew's POC implementation:
https://github.com/andrewkroh/integrations/commits/mimecast-siem-batch-per-execution/

Checklist

I have reviewed tips for building integrations and this pull request is aligned with them.
I have verified that all data streams collect metrics or logs.
I have added an entry to my package's changelog.yml file.
I have verified that Kibana version constraints are current according to guidelines.
I have verified that any added dashboard complies with Kibana's Dashboard good practices

Note:

Skipped the httpjson test in 'siem_logs' as a fix for the health status degraded issue did not seem
fit for this current PR.

Author's Checklist

[ ]

How to test this PR locally

Related issues

Closes [mimecast.siem_logs] Refactor to reduce memory pressure #16022

Screenshots

…o andrew's proof of concept

…te eval failure

elasticmachine · 2025-12-04T11:46:00Z

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

elastic-vault-github-plugin-prod · 2025-12-04T12:21:42Z

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

andrewkroh · 2025-12-04T21:19:33Z

packages/mimecast/data_stream/cloud_integrated_logs/agent/stream/cel.yml.hbs

+                    ),
+                  },
+                },
+                "want_more": false,


I am wondering if we are going to get into a non-recoverable state if one of these blobs ages out (Mimecast claims to retain data for 7 days). Assuming the API returns a 404, maybe it should move to the next blob (send an error event, remove the item from cursor.blobs, continue to the next). WDYT?

I'm just concerned if a 404 is strictly enforced from mimecast's end in such a scenario or not. If a temporary issue causes a 404 then we will be acting upon a false positive.

CEL already retries 5 times by default, what if we increment the waitMin and waitMax to a greater value and then on a non 200, we deterministically remove the faulty blob ? This way we keep a retry fall back and if it does not succeed, we effectively remove the blob from the list permanently and emit a error event, but continue to process the rest of the list.

on a non 200, we deterministically remove the faulty blob

I disagree with the non-200. I think that is too wide of a condition. I would limit it to HTTP 404 to begin with because this is the conventional status code for this state, but if we learn more, we can expand the status codes as required (e.g. HTTP 409).

efd6

Rémy suggests the following commit message (with my modification)

mimecast: refactor CEL code to prevent memory issues and OOM failures

The existing CEL implementation in siem_logs data stream was causing
out-of-memory failures during processing. The refactored approach uses
batch processing techniques to reduce memory consumption.

The cloud_integrated_logs data stream received similar optimizations
due to code sharing with siem_logs.

Based on Andrew's POC implementation:
https://github.com/andrewkroh/integrations/commits/mimecast-siem-batch-per-execution/

I've replaced code "coupling" with "sharing" since there is no coupling.

efd6 · 2025-12-04T21:17:49Z

packages/mimecast/changelog.yml

+    - description: Refactored the CEL program for siem_logs data stream to improve memory usage.
+      type: bugfix
+      link: https://github.com/elastic/integrations/pull/16308
+    - description: Refactored the CEL program for cloud_integrated_logs data stream to improve memory usage.
+      type: bugfix
+      link: https://github.com/elastic/integrations/pull/16308


Suggested change

- description: Refactored the CEL program for siem_logs data stream to improve memory usage.

type: bugfix

link: https://github.com/elastic/integrations/pull/16308

- description: Refactored the CEL program for cloud_integrated_logs data stream to improve memory usage.

type: bugfix

link: https://github.com/elastic/integrations/pull/16308

- description: Refactored the CEL program for cloud_integrated_logs and siem_logs data streams to improve memory usage.

type: bugfix

link: https://github.com/elastic/integrations/pull/16308

efd6 · 2025-12-04T21:28:02Z

packages/mimecast/data_stream/cloud_integrated_logs/agent/stream/cel.yml.hbs

+                  "access_token": token.access_token,
+                  "expires": token.expires,
+                },
+                "want_more": size(tail(blobs)) != 0 || size(work_list) != 0,


Suggested change

"want_more": size(tail(blobs)) != 0 || size(work_list) != 0,

"want_more": size(blobs) > 1 || size(work_list) != 0,

in both files

elasticmachine · 2025-12-05T11:26:53Z

💚 Build Succeeded

Buildkite Build
Commit: 85cd034

History

💚 Build #35094 succeeded 83f85d8

cc @ShourieG

ShourieG added 3 commits December 4, 2025 16:03

refactored siem_logs & cloud_integrated_logs data streams according t…

163c402

…o andrew's proof of concept

skipped httpjson system test due to health status degraded and templa…

0d58a3a

…te eval failure

added changelog and updated manifest

9690d0c

ShourieG self-assigned this Dec 4, 2025

ShourieG added Integration:mimecast Mimecast (Partner supported) bugfix Pull request that fixes a bug issue Team:Security-Service Integrations Security Service Integrations team [elastic/security-service-integrations] labels Dec 4, 2025

updated changelog

83f85d8

ShourieG marked this pull request as ready for review December 4, 2025 11:45

ShourieG requested a review from a team as a code owner December 4, 2025 11:45

ShourieG requested a review from efd6 December 4, 2025 11:47

andrewkroh reviewed Dec 4, 2025

View reviewed changes

efd6 reviewed Dec 4, 2025

View reviewed changes

addressed Dan's suggestions

85cd034

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Mimecast] - Refactored 'siem_logs' & 'cloud_integrated_logs' data streams to improve memory usage #16308

[Mimecast] - Refactored 'siem_logs' & 'cloud_integrated_logs' data streams to improve memory usage #16308

Uh oh!

ShourieG commented Dec 4, 2025 •

edited

Loading

Uh oh!

elasticmachine commented Dec 4, 2025

Uh oh!

elastic-vault-github-plugin-prod bot commented Dec 4, 2025

Uh oh!

andrewkroh Dec 4, 2025

Uh oh!

ShourieG Dec 5, 2025 •

edited

Loading

Uh oh!

andrewkroh Dec 5, 2025

Uh oh!

efd6 left a comment •

edited

Loading

Uh oh!

efd6 Dec 4, 2025

Uh oh!

efd6 Dec 4, 2025

Uh oh!

elasticmachine commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	"want_more": size(tail(blobs)) != 0 \|\| size(work_list) != 0,
	"want_more": size(blobs) > 1 \|\| size(work_list) != 0,

[Mimecast] - Refactored 'siem_logs' & 'cloud_integrated_logs' data streams to improve memory usage #16308

Are you sure you want to change the base?

[Mimecast] - Refactored 'siem_logs' & 'cloud_integrated_logs' data streams to improve memory usage #16308

Uh oh!

Conversation

ShourieG commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Type of change

Proposed commit message

Checklist

Note:

Author's Checklist

How to test this PR locally

Related issues

Screenshots

Uh oh!

elasticmachine commented Dec 4, 2025

Uh oh!

elastic-vault-github-plugin-prod bot commented Dec 4, 2025

🚀 Benchmarks report

Uh oh!

andrewkroh Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

ShourieG Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewkroh Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

efd6 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

efd6 Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

efd6 Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

elasticmachine commented Dec 5, 2025

💚 Build Succeeded

History

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ShourieG commented Dec 4, 2025 •

edited

Loading

ShourieG Dec 5, 2025 •

edited

Loading

efd6 left a comment •

edited

Loading