-
Notifications
You must be signed in to change notification settings - Fork 14
feat: add audit for content fragment 404s #1282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
This PR will trigger no release when merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds comprehensive audit functionality for detecting and suggesting fixes for broken content fragments in AEM environments. The implementation provides a robust system that analyzes CDN 404 logs and suggests content fixes through various strategies.
- CDN 404 analysis audit to process Fastly logs and identify broken content paths
- Broken content path audit system with multiple rule-based suggestion strategies (publish, locale fallback, similarity detection)
- Integration with AEM Author instances to verify content availability and suggest fixes
Reviewed Changes
Copilot reviewed 41 out of 41 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/index.js | Adds broken-content-path and cdn-404-analysis audit handlers to the main handlers registry |
| src/cdn-404-analysis/handler.js | Main handler for CDN 404 analysis with Athena integration for processing Fastly logs |
| src/broken-content-path/handler.js | Multi-step audit handler orchestrating content path analysis and suggestion generation |
| Multiple SQL files | Database and table creation scripts plus queries for Athena-based log analysis |
| Multiple domain classes | Content path, locale, language tree, path index, and suggestion domain models |
| Client and analysis classes | AEM Author client for content verification and analysis strategy for suggestion generation |
| Rule implementations | Three specialized rules for publish, locale fallback, and path similarity detection |
| Utility classes | Path manipulation, Levenshtein distance calculation, and other helper functions |
| Comprehensive test files | Full test coverage for all new functionality with integration scenarios |
Comments suppressed due to low confidence (1)
src/broken-content-path/utils/path-utils.js:1
- The removeDoubleSlashes method incorrectly removes one slash from protocol URLs. The regex
^([^:]+:\/)\/+should be^([^:]+:\/)\/?to preserve the double slash in protocols while removing extras.
/*
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 41 out of 41 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
src/content-fragment-broken-links/utils/path-utils.js:1
- The removeDoubleSlashes function incorrectly removes one slash from protocol schemes. The expected result should preserve the protocol format 'http://example.com/path' rather than converting to 'http:/example.com/path'.
/*
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
# [1.269.0](v1.268.0...v1.269.0) (2025-12-10) ### Features * protect code coverage on main branch ([#1745](#1745)) ([8182425](8182425))
Recently prerender content scraper was moved to `SCRAPE_CLIENT` in PR: #1772 With `SCRAPE_CLIENT`, now we are creating random `jobId` unlike `CONTENT_SCRAPER` where we were assigning siteId as the jobId. As a result, all files in content scraper code was pushed to a folder with a random `jobId` while audit worker was downloading files with path using `siteId`. Due to this mismatch, audit was not able to run the analysis and no suggestions were created. Updated audit worker to read content using `scrapeJobId` Please ensure your pull request adheres to the following guidelines: - [ ] make sure to link the related issues in this description - [ ] when merging / squashing, make sure the fixed issue references are visible in the commits, for easy compilation of release notes - [ ] If data sources for any opportunity has been updated/added, please update the [wiki](https://wiki.corp.adobe.com/display/AEMSites/Data+Sources+for+Opportunities) for same opportunity. ## Related Issues Thanks for contributing!
## [1.269.1](v1.269.0...v1.269.1) (2025-12-10) ### Bug Fixes * prerender audit use scrapejobId for s3 paths ([#1786](#1786)) ([5d37793](5d37793))
Fix: #1746 Added a domain-wide suggestion that: Uses regex pattern https://example.com/.* to cover ALL URLs in the domain (not just audited ones) Please ensure your pull request adheres to the following guidelines: - [x] make sure to link the related issues in this description - [x] when merging / squashing, make sure the fixed issue references are visible in the commits, for easy compilation of release notes - [ ] If data sources for any opportunity has been updated/added, please update the [wiki](https://wiki.corp.adobe.com/display/AEMSites/Data+Sources+for+Opportunities) for same opportunity. ## Related Issues Thanks for contributing!
# [1.270.0](v1.269.1...v1.270.0) (2025-12-10) ### Features * Prerender Audit - Domain-Wide Suggestion ([#1747](#1747)) ([fa2c208](fa2c208))
Improve page types
## [1.270.1](v1.270.0...v1.270.1) (2025-12-11) ### Bug Fixes * improve page types ([#1791](#1791)) ([0399b29](0399b29))
Please ensure your pull request adheres to the following guidelines: - [ ] make sure to link the related issues in this description - [ ] when merging / squashing, make sure the fixed issue references are visible in the commits, for easy compilation of release notes - [ ] If data sources for any opportunity has been updated/added, please update the [wiki](https://wiki.corp.adobe.com/display/AEMSites/Data+Sources+for+Opportunities) for same opportunity. ## Related Issues Thanks for contributing!
… v2.40.3 (#1788) This PR contains the following updates: | Package | Change | [Age](https://docs.renovatebot.com/merge-confidence/) | [Confidence](https://docs.renovatebot.com/merge-confidence/) | |---|---|---|---| | [@adobe/spacecat-shared-rum-api-client](https://redirect.github.com/adobe/spacecat-shared) | [`2.40.2` -> `2.40.3`](https://renovatebot.com/diffs/npm/@adobe%2fspacecat-shared-rum-api-client/2.40.2/2.40.3) |  |  | --- ### Release Notes <details> <summary>adobe/spacecat-shared (@​adobe/spacecat-shared-rum-api-client)</summary> ### [`v2.40.3`](https://redirect.github.com/adobe/spacecat-shared/releases/tag/%40adobe/spacecat-shared-rum-api-client-v2.40.3) [Compare Source](https://redirect.github.com/adobe/spacecat-shared/compare/@adobe/spacecat-shared-rum-api-client-v2.40.2...@adobe/spacecat-shared-rum-api-client-v2.40.3) ##### Bug Fixes - **deps:** update dependency [@​adobe/rum-distiller](https://redirect.github.com/adobe/rum-distiller) to v1.22.1 ([#​1234](https://redirect.github.com/adobe/spacecat-shared/issues/1234)) ([6fef066](https://redirect.github.com/adobe/spacecat-shared/commit/6fef066633f737af65819e04eb4f27067c67d19e)) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR was generated by [Mend Renovate](https://mend.io/renovate/). View the [repository job log](https://developer.mend.io/github/adobe/spacecat-audit-worker). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0Mi40Mi4yIiwidXBkYXRlZEluVmVyIjoiNDIuNDIuMiIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOltdfQ==--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
## [1.270.2](v1.270.1...v1.270.2) (2025-12-11) ### Bug Fixes * **deps:** update dependency @adobe/spacecat-shared-rum-api-client to v2.40.3 ([#1788](#1788)) ([be14c4f](be14c4f))
- Add wikipedia-analysis handler to trigger audit and send to Mystique - Add guidance handler to receive results and create opportunities - Add opportunity data mapper for Wikipedia analysis - Register handlers in index.js Please ensure your pull request adheres to the following guidelines: - [ ] make sure to link the related issues in this description - [ ] when merging / squashing, make sure the fixed issue references are visible in the commits, for easy compilation of release notes - [ ] If data sources for any opportunity has been updated/added, please update the [wiki](https://wiki.corp.adobe.com/display/AEMSites/Data+Sources+for+Opportunities) for same opportunity. ## Related Issues Thanks for contributing!
# [1.271.0](v1.270.2...v1.271.0) (2025-12-11) ### Features * add Wikipedia analysis audit handlers ([#1765](#1765)) ([4db8ad9](4db8ad9))
…1794) This PR contains the following updates: | Package | Change | [Age](https://docs.renovatebot.com/merge-confidence/) | [Confidence](https://docs.renovatebot.com/merge-confidence/) | |---|---|---|---| | [@adobe/spacecat-shared-utils](https://redirect.github.com/adobe/spacecat-shared) | [`1.85.0` -> `1.85.2`](https://renovatebot.com/diffs/npm/@adobe%2fspacecat-shared-utils/1.85.0/1.85.2) |  |  | --- ### Release Notes <details> <summary>adobe/spacecat-shared (@​adobe/spacecat-shared-utils)</summary> ### [`v1.85.2`](https://redirect.github.com/adobe/spacecat-shared/releases/tag/%40adobe/spacecat-shared-utils-v1.85.2) [Compare Source](https://redirect.github.com/adobe/spacecat-shared/compare/@adobe/spacecat-shared-utils-v1.85.1...@adobe/spacecat-shared-utils-v1.85.2) ##### Bug Fixes - Implement Structured (JSON) Logging for Spacecat Audits - rollback ([#​1239](https://redirect.github.com/adobe/spacecat-shared/issues/1239)) ([1f174d7](https://redirect.github.com/adobe/spacecat-shared/commit/1f174d7dd188dbdc610b75bf58644992925755b1)) ### [`v1.85.1`](https://redirect.github.com/adobe/spacecat-shared/releases/tag/%40adobe/spacecat-shared-utils-v1.85.1) [Compare Source](https://redirect.github.com/adobe/spacecat-shared/compare/@adobe/spacecat-shared-utils-v1.85.0...@adobe/spacecat-shared-utils-v1.85.1) ##### Bug Fixes - Structured (JSON) for Logging should be JSON string ([#​1237](https://redirect.github.com/adobe/spacecat-shared/issues/1237)) ([cfcee6e](https://redirect.github.com/adobe/spacecat-shared/commit/cfcee6e4315aa518c52e4ca50b99b0cb762f5a61)) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR was generated by [Mend Renovate](https://mend.io/renovate/). View the [repository job log](https://developer.mend.io/github/adobe/spacecat-audit-worker). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0Mi40Mi4yIiwidXBkYXRlZEluVmVyIjoiNDIuNDIuMiIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOltdfQ==--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
## [1.271.1](v1.271.0...v1.271.1) (2025-12-12) ### Bug Fixes * **deps:** update dependency @adobe/spacecat-shared-utils to v1.85.2 ([#1794](#1794)) ([a93c2f7](a93c2f7))
feat: enable div text detection (#1795)
# [1.272.0](v1.271.1...v1.272.0) (2025-12-12) ### Features * enable div text detection ([#1795](#1795)) ([182ec70](182ec70))
…1792) Please ensure your pull request adheres to the following guidelines: - [ ] make sure to link the related issues in this description - [ ] when merging / squashing, make sure the fixed issue references are visible in the commits, for easy compilation of release notes - [ ] If data sources for any opportunity has been updated/added, please update the [wiki](https://wiki.corp.adobe.com/display/AEMSites/Data+Sources+for+Opportunities) for same opportunity. ## Related Issues Thanks for contributing!
# [1.273.0](v1.272.0...v1.273.0) (2025-12-12) ### Features * LLMO-1925 audits support cdn logs coming from 'other' sources ([#1792](#1792)) ([47908ec](47908ec))
update scores
## [1.273.1](v1.273.0...v1.273.1) (2025-12-12) ### Bug Fixes * update scores ([#1796](#1796)) ([fe52b53](fe52b53))
…1798) This PR contains the following updates: | Package | Change | [Age](https://docs.renovatebot.com/merge-confidence/) | [Confidence](https://docs.renovatebot.com/merge-confidence/) | |---|---|---|---| | [@adobe/spacecat-shared-utils](https://redirect.github.com/adobe/spacecat-shared) | [`1.85.2` -> `1.86.0`](https://renovatebot.com/diffs/npm/@adobe%2fspacecat-shared-utils/1.85.2/1.86.0) |  |  | --- ### Release Notes <details> <summary>adobe/spacecat-shared (@​adobe/spacecat-shared-utils)</summary> ### [`v1.86.0`](https://redirect.github.com/adobe/spacecat-shared/releases/tag/%40adobe/spacecat-shared-utils-v1.86.0) [Compare Source](https://redirect.github.com/adobe/spacecat-shared/compare/@adobe/spacecat-shared-utils-v1.85.2...@adobe/spacecat-shared-utils-v1.86.0) ##### Features - add detection for Akamai, Fastly, and CloudFront ([#​1238](https://redirect.github.com/adobe/spacecat-shared/issues/1238)) ([3f7aad9](https://redirect.github.com/adobe/spacecat-shared/commit/3f7aad96fbc823b2e9d59541a71ba3b4e6d315e8)) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR was generated by [Mend Renovate](https://mend.io/renovate/). View the [repository job log](https://developer.mend.io/github/adobe/spacecat-audit-worker). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0Mi40Mi4yIiwidXBkYXRlZEluVmVyIjoiNDIuNDIuMiIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOltdfQ==--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
## [1.273.2](v1.273.1...v1.273.2) (2025-12-12) ### Bug Fixes * **deps:** update dependency @adobe/spacecat-shared-utils to v1.86.0 ([#1798](#1798)) ([6ba24e4](6ba24e4))
## [1.273.3](v1.273.2...v1.273.3) (2025-12-19) ### Bug Fixes * **llmo-customer-analysis:** gracefully fail enabling of imports and audits ([#1808](#1808)) ([2267ce4](2267ce4))
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
This PR introduces two new audits to monitor and analyze content fragment 404s on AEM. See SITES-35769.
What's New
cdn-content-fragment-404): Monitors CDN logs hourly to identify content fragment requests that return 404 errors, using Athena queries to aggregate and export the data to S3 for further analysis and reporting.content-fragment-404): Analyzes broken content fragment paths discovered in CDN logs on a daily basis, and intelligently suggests repair actions through a multi-step workflow that applies various strategies like republishing, locale fallbacks, and similar path matching.Use Case
Health monitoring for Content Fragments on AEM Sites: Automatically detect broken content fragment requests across AEM Sites by monitoring CDN traffic patterns, identifying 404 errors, and providing actionable repair suggestions to maintain content availability and user experience.
Related