Skip to content

Conversation

@holtvogt
Copy link

@holtvogt holtvogt commented Sep 18, 2025

This PR introduces two new audits to monitor and analyze content fragment 404s on AEM. See SITES-35769.

What's New

  • CDN Content Fragment 404 Audit (cdn-content-fragment-404): Monitors CDN logs hourly to identify content fragment requests that return 404 errors, using Athena queries to aggregate and export the data to S3 for further analysis and reporting.
  • Content Fragment 404 Audit (content-fragment-404): Analyzes broken content fragment paths discovered in CDN logs on a daily basis, and intelligently suggests repair actions through a multi-step workflow that applies various strategies like republishing, locale fallbacks, and similar path matching.

Use Case

Health monitoring for Content Fragments on AEM Sites: Automatically detect broken content fragment requests across AEM Sites by monitoring CDN traffic patterns, identifying 404 errors, and providing actionable repair suggestions to maintain content availability and user experience.

Related

@github-actions
Copy link

This PR will trigger no release when merged.

@holtvogt holtvogt requested a review from Copilot September 25, 2025 06:47
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds comprehensive audit functionality for detecting and suggesting fixes for broken content fragments in AEM environments. The implementation provides a robust system that analyzes CDN 404 logs and suggests content fixes through various strategies.

  • CDN 404 analysis audit to process Fastly logs and identify broken content paths
  • Broken content path audit system with multiple rule-based suggestion strategies (publish, locale fallback, similarity detection)
  • Integration with AEM Author instances to verify content availability and suggest fixes

Reviewed Changes

Copilot reviewed 41 out of 41 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/index.js Adds broken-content-path and cdn-404-analysis audit handlers to the main handlers registry
src/cdn-404-analysis/handler.js Main handler for CDN 404 analysis with Athena integration for processing Fastly logs
src/broken-content-path/handler.js Multi-step audit handler orchestrating content path analysis and suggestion generation
Multiple SQL files Database and table creation scripts plus queries for Athena-based log analysis
Multiple domain classes Content path, locale, language tree, path index, and suggestion domain models
Client and analysis classes AEM Author client for content verification and analysis strategy for suggestion generation
Rule implementations Three specialized rules for publish, locale fallback, and path similarity detection
Utility classes Path manipulation, Levenshtein distance calculation, and other helper functions
Comprehensive test files Full test coverage for all new functionality with integration scenarios
Comments suppressed due to low confidence (1)

src/broken-content-path/utils/path-utils.js:1

  • The removeDoubleSlashes method incorrectly removes one slash from protocol URLs. The regex ^([^:]+:\/)\/+ should be ^([^:]+:\/)\/? to preserve the double slash in protocols while removing extras.
/*

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@holtvogt holtvogt requested a review from Copilot September 29, 2025 12:31
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 41 out of 41 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

src/content-fragment-broken-links/utils/path-utils.js:1

  • The removeDoubleSlashes function incorrectly removes one slash from protocol schemes. The expected result should preserve the protocol format 'http://example.com/path' rather than converting to 'http:/example.com/path'.
/*

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

semantic-release-bot and others added 27 commits December 20, 2025 15:32
# [1.268.0](v1.267.22...v1.268.0) (2025-12-10)

### Bug Fixes

* test coverage ([#1787](#1787)) ([681f803](681f803))

### Features

* **geo-brand-presence:** add `-paid` and `-free` for scheduling ([#1785](#1785)) ([be9d5bb](be9d5bb))
# [1.269.0](v1.268.0...v1.269.0) (2025-12-10)

### Features

* protect code coverage on main branch ([#1745](#1745)) ([8182425](8182425))
Recently prerender content scraper was moved to `SCRAPE_CLIENT` in PR:
#1772

With `SCRAPE_CLIENT`, now we are creating random `jobId` unlike
`CONTENT_SCRAPER` where we were assigning siteId as the jobId.
As a result, all files in content scraper code was pushed to a folder
with a random `jobId` while audit worker was downloading files with path
using `siteId`. Due to this mismatch, audit was not able to run the
analysis and no suggestions were created.

Updated audit worker to read content using `scrapeJobId`

Please ensure your pull request adheres to the following guidelines:
- [ ] make sure to link the related issues in this description
- [ ] when merging / squashing, make sure the fixed issue references are
visible in the commits, for easy compilation of release notes
- [ ] If data sources for any opportunity has been updated/added, please
update the
[wiki](https://wiki.corp.adobe.com/display/AEMSites/Data+Sources+for+Opportunities)
for same opportunity.

## Related Issues


Thanks for contributing!
## [1.269.1](v1.269.0...v1.269.1) (2025-12-10)

### Bug Fixes

* prerender audit use scrapejobId for s3 paths ([#1786](#1786)) ([5d37793](5d37793))
Fix: #1746

Added a domain-wide suggestion that: Uses regex pattern
https://example.com/.* to cover ALL URLs in the domain (not just audited
ones)

Please ensure your pull request adheres to the following guidelines:
- [x] make sure to link the related issues in this description
- [x] when merging / squashing, make sure the fixed issue references are
visible in the commits, for easy compilation of release notes
- [ ] If data sources for any opportunity has been updated/added, please
update the
[wiki](https://wiki.corp.adobe.com/display/AEMSites/Data+Sources+for+Opportunities)
for same opportunity.

## Related Issues


Thanks for contributing!
# [1.270.0](v1.269.1...v1.270.0) (2025-12-10)

### Features

* Prerender Audit - Domain-Wide Suggestion ([#1747](#1747)) ([fa2c208](fa2c208))
## [1.270.1](v1.270.0...v1.270.1) (2025-12-11)

### Bug Fixes

* improve page types ([#1791](#1791)) ([0399b29](0399b29))
Please ensure your pull request adheres to the following guidelines:
- [ ] make sure to link the related issues in this description
- [ ] when merging / squashing, make sure the fixed issue references are
visible in the commits, for easy compilation of release notes
- [ ] If data sources for any opportunity has been updated/added, please
update the
[wiki](https://wiki.corp.adobe.com/display/AEMSites/Data+Sources+for+Opportunities)
for same opportunity.

## Related Issues


Thanks for contributing!
… v2.40.3 (#1788)

This PR contains the following updates:

| Package | Change |
[Age](https://docs.renovatebot.com/merge-confidence/) |
[Confidence](https://docs.renovatebot.com/merge-confidence/) |
|---|---|---|---|
|
[@adobe/spacecat-shared-rum-api-client](https://redirect.github.com/adobe/spacecat-shared)
| [`2.40.2` ->
`2.40.3`](https://renovatebot.com/diffs/npm/@adobe%2fspacecat-shared-rum-api-client/2.40.2/2.40.3)
|
![age](https://developer.mend.io/api/mc/badges/age/npm/@adobe%2fspacecat-shared-rum-api-client/2.40.3?slim=true)
|
![confidence](https://developer.mend.io/api/mc/badges/confidence/npm/@adobe%2fspacecat-shared-rum-api-client/2.40.2/2.40.3?slim=true)
|

---

### Release Notes

<details>
<summary>adobe/spacecat-shared
(@&#8203;adobe/spacecat-shared-rum-api-client)</summary>

###
[`v2.40.3`](https://redirect.github.com/adobe/spacecat-shared/releases/tag/%40adobe/spacecat-shared-rum-api-client-v2.40.3)

[Compare
Source](https://redirect.github.com/adobe/spacecat-shared/compare/@adobe/spacecat-shared-rum-api-client-v2.40.2...@adobe/spacecat-shared-rum-api-client-v2.40.3)

##### Bug Fixes

- **deps:** update dependency
[@&#8203;adobe/rum-distiller](https://redirect.github.com/adobe/rum-distiller)
to v1.22.1
([#&#8203;1234](https://redirect.github.com/adobe/spacecat-shared/issues/1234))
([6fef066](https://redirect.github.com/adobe/spacecat-shared/commit/6fef066633f737af65819e04eb4f27067c67d19e))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Enabled.

♻ **Rebasing**: Whenever PR is behind base branch, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR was generated by [Mend Renovate](https://mend.io/renovate/).
View the [repository job
log](https://developer.mend.io/github/adobe/spacecat-audit-worker).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0Mi40Mi4yIiwidXBkYXRlZEluVmVyIjoiNDIuNDIuMiIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOltdfQ==-->

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
## [1.270.2](v1.270.1...v1.270.2) (2025-12-11)

### Bug Fixes

* **deps:** update dependency @adobe/spacecat-shared-rum-api-client to v2.40.3 ([#1788](#1788)) ([be14c4f](be14c4f))
- Add wikipedia-analysis handler to trigger audit and send to Mystique
- Add guidance handler to receive results and create opportunities
- Add opportunity data mapper for Wikipedia analysis
- Register handlers in index.js

Please ensure your pull request adheres to the following guidelines:
- [ ] make sure to link the related issues in this description
- [ ] when merging / squashing, make sure the fixed issue references are
visible in the commits, for easy compilation of release notes
- [ ] If data sources for any opportunity has been updated/added, please
update the
[wiki](https://wiki.corp.adobe.com/display/AEMSites/Data+Sources+for+Opportunities)
for same opportunity.

## Related Issues


Thanks for contributing!
# [1.271.0](v1.270.2...v1.271.0) (2025-12-11)

### Features

* add Wikipedia analysis audit handlers ([#1765](#1765)) ([4db8ad9](4db8ad9))
…1794)

This PR contains the following updates:

| Package | Change |
[Age](https://docs.renovatebot.com/merge-confidence/) |
[Confidence](https://docs.renovatebot.com/merge-confidence/) |
|---|---|---|---|
|
[@adobe/spacecat-shared-utils](https://redirect.github.com/adobe/spacecat-shared)
| [`1.85.0` ->
`1.85.2`](https://renovatebot.com/diffs/npm/@adobe%2fspacecat-shared-utils/1.85.0/1.85.2)
|
![age](https://developer.mend.io/api/mc/badges/age/npm/@adobe%2fspacecat-shared-utils/1.85.2?slim=true)
|
![confidence](https://developer.mend.io/api/mc/badges/confidence/npm/@adobe%2fspacecat-shared-utils/1.85.0/1.85.2?slim=true)
|

---

### Release Notes

<details>
<summary>adobe/spacecat-shared
(@&#8203;adobe/spacecat-shared-utils)</summary>

###
[`v1.85.2`](https://redirect.github.com/adobe/spacecat-shared/releases/tag/%40adobe/spacecat-shared-utils-v1.85.2)

[Compare
Source](https://redirect.github.com/adobe/spacecat-shared/compare/@adobe/spacecat-shared-utils-v1.85.1...@adobe/spacecat-shared-utils-v1.85.2)

##### Bug Fixes

- Implement Structured (JSON) Logging for Spacecat Audits - rollback
([#&#8203;1239](https://redirect.github.com/adobe/spacecat-shared/issues/1239))
([1f174d7](https://redirect.github.com/adobe/spacecat-shared/commit/1f174d7dd188dbdc610b75bf58644992925755b1))

###
[`v1.85.1`](https://redirect.github.com/adobe/spacecat-shared/releases/tag/%40adobe/spacecat-shared-utils-v1.85.1)

[Compare
Source](https://redirect.github.com/adobe/spacecat-shared/compare/@adobe/spacecat-shared-utils-v1.85.0...@adobe/spacecat-shared-utils-v1.85.1)

##### Bug Fixes

- Structured (JSON) for Logging should be JSON string
([#&#8203;1237](https://redirect.github.com/adobe/spacecat-shared/issues/1237))
([cfcee6e](https://redirect.github.com/adobe/spacecat-shared/commit/cfcee6e4315aa518c52e4ca50b99b0cb762f5a61))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Enabled.

♻ **Rebasing**: Whenever PR is behind base branch, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR was generated by [Mend Renovate](https://mend.io/renovate/).
View the [repository job
log](https://developer.mend.io/github/adobe/spacecat-audit-worker).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0Mi40Mi4yIiwidXBkYXRlZEluVmVyIjoiNDIuNDIuMiIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOltdfQ==-->

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
## [1.271.1](v1.271.0...v1.271.1) (2025-12-12)

### Bug Fixes

* **deps:** update dependency @adobe/spacecat-shared-utils to v1.85.2 ([#1794](#1794)) ([a93c2f7](a93c2f7))
feat: enable div text detection (#1795)
# [1.272.0](v1.271.1...v1.272.0) (2025-12-12)

### Features

* enable div text detection ([#1795](#1795)) ([182ec70](182ec70))
…1792)

Please ensure your pull request adheres to the following guidelines:
- [ ] make sure to link the related issues in this description
- [ ] when merging / squashing, make sure the fixed issue references are
visible in the commits, for easy compilation of release notes
- [ ] If data sources for any opportunity has been updated/added, please
update the
[wiki](https://wiki.corp.adobe.com/display/AEMSites/Data+Sources+for+Opportunities)
for same opportunity.

## Related Issues


Thanks for contributing!
# [1.273.0](v1.272.0...v1.273.0) (2025-12-12)

### Features

* LLMO-1925 audits support cdn logs coming from 'other' sources ([#1792](#1792)) ([47908ec](47908ec))
## [1.273.1](v1.273.0...v1.273.1) (2025-12-12)

### Bug Fixes

* update scores ([#1796](#1796)) ([fe52b53](fe52b53))
…1798)

This PR contains the following updates:

| Package | Change |
[Age](https://docs.renovatebot.com/merge-confidence/) |
[Confidence](https://docs.renovatebot.com/merge-confidence/) |
|---|---|---|---|
|
[@adobe/spacecat-shared-utils](https://redirect.github.com/adobe/spacecat-shared)
| [`1.85.2` ->
`1.86.0`](https://renovatebot.com/diffs/npm/@adobe%2fspacecat-shared-utils/1.85.2/1.86.0)
|
![age](https://developer.mend.io/api/mc/badges/age/npm/@adobe%2fspacecat-shared-utils/1.86.0?slim=true)
|
![confidence](https://developer.mend.io/api/mc/badges/confidence/npm/@adobe%2fspacecat-shared-utils/1.85.2/1.86.0?slim=true)
|

---

### Release Notes

<details>
<summary>adobe/spacecat-shared
(@&#8203;adobe/spacecat-shared-utils)</summary>

###
[`v1.86.0`](https://redirect.github.com/adobe/spacecat-shared/releases/tag/%40adobe/spacecat-shared-utils-v1.86.0)

[Compare
Source](https://redirect.github.com/adobe/spacecat-shared/compare/@adobe/spacecat-shared-utils-v1.85.2...@adobe/spacecat-shared-utils-v1.86.0)

##### Features

- add detection for Akamai, Fastly, and CloudFront
([#&#8203;1238](https://redirect.github.com/adobe/spacecat-shared/issues/1238))
([3f7aad9](https://redirect.github.com/adobe/spacecat-shared/commit/3f7aad96fbc823b2e9d59541a71ba3b4e6d315e8))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Enabled.

♻ **Rebasing**: Whenever PR is behind base branch, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR was generated by [Mend Renovate](https://mend.io/renovate/).
View the [repository job
log](https://developer.mend.io/github/adobe/spacecat-audit-worker).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0Mi40Mi4yIiwidXBkYXRlZEluVmVyIjoiNDIuNDIuMiIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOltdfQ==-->

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
## [1.273.2](v1.273.1...v1.273.2) (2025-12-12)

### Bug Fixes

* **deps:** update dependency @adobe/spacecat-shared-utils to v1.86.0 ([#1798](#1798)) ([6ba24e4](6ba24e4))
## [1.273.3](v1.273.2...v1.273.3) (2025-12-19)

### Bug Fixes

* **llmo-customer-analysis:** gracefully fail enabling of imports and audits ([#1808](#1808)) ([2267ce4](2267ce4))
@codecov
Copy link

codecov bot commented Dec 20, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.