Skip to content

Conversation

@epipav
Copy link
Collaborator

@epipav epipav commented Dec 5, 2025

Note

Implements a hash‑bucketed activityRelations pipeline (10 buckets) with routing/union queries, adds unfiltered lambda snapshot flow, migrates pipes to bucketed sources, and updates docs, monitoring, and utilities.

  • Architecture & Docs:
    • Introduce 10-bucket hash partitioning for activityRelations; add detailed docs (README.md, bucketing-architecture.md, lambda-architecture.md, dataflow.md).
  • Ingestion/Serving (Unfiltered Lambda):
    • Add activityRelations_enrich_snapshot_MV + daily snapshot merger to activityRelations_enriched_deduplicated_ds; initial snapshot pipe.
  • Bucketing Pipeline:
    • New MVs activityRelations_bucket_MV_{0..9} → raw bucket datasources.
    • Hourly clean/enrich copy pipes → activityRelations_deduplicated_cleaned_bucket_{0..9}_ds.
    • Query layer: activityRelations_bucket_routing (per-bucket) and activityRelations_deduplicated_cleaned_bucket_union (cross-bucket).
  • Pipe Migrations:
    • Switch many analytics pipes (activities, health scores, leaderboards, insights copies, retention, geo, etc.) to read from bucket routing/union.
    • PR analytics updated to consume lambda MV output; add baseline/merge MV variant.
  • Monitoring & Widgets:
    • Add monitoring_entities, monitoring_copy_pipes_spread_info, monitoring_long_running_endpoints pipes.
    • New review_efficiency pipe; new activities_daily_counts.
  • Datasources & Utilities:
    • Add activities_backup_consistency_audit.datasource; minor schema/partition tweaks across several datasources.
    • Add sitemap pipes; enhance format_all.sh with --sequential option.

Written by Cursor Bugbot for commit 0372eb4. This will update automatically on new commits. Configure here.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the final PR Bugbot will review for you during this billing cycle

Your free Bugbot reviews will reset on December 12

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

@github-actions

This comment was marked as resolved.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conventional Commits FTW!

@github-actions

This comment was marked as resolved.

@epipav epipav changed the title feat: activityRelations buckets for subset of projects feat: activityRelations buckets for subset of projects Dec 5, 2025
@github-actions

This comment was marked as resolved.

@github-actions

This comment was marked as resolved.

1 similar comment
@github-actions
Copy link
Contributor

github-actions bot commented Dec 5, 2025

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

@github-actions

This comment was marked as resolved.

1 similar comment
@github-actions

This comment was marked as resolved.

@github-actions

This comment was marked as outdated.

1 similar comment
@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

2 similar comments
@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@epipav epipav requested review from gaspergrom and ulemons December 9, 2025 13:41
@github-actions

This comment was marked as outdated.

@epipav epipav changed the title feat: activityRelations buckets for subset of projects feat: activityRelations buckets for subset of projects [IN-871] Dec 9, 2025
@epipav epipav changed the title feat: activityRelations buckets for subset of projects [IN-871] feat: activityRelations buckets for subset of projects (IN-871) Dec 9, 2025
@epipav epipav requested a review from joanagmaia December 9, 2025 16:22
@ulemons
Copy link
Contributor

ulemons commented Dec 10, 2025

Great job on this documentation @anilb0stanci 👏 , it’s very clear and helpful!
I do have one concern though: what happens when we increase the number of buckets?
Since the bucket assignment depends on cityHash64(segmentId) % bucket_count, changing the bucket count could re-route a segment to a different bucket (e.g. something previously in bucket 3 might now belong in bucket 12).
If the data migration is not handled carefully, couldn’t we end up in a situation where new data is written to the “new” bucket, while historical data is still in the “old” one? In that case, queries that rely on bucket routing might miss part of the data.
Do we have a clear migration strategy to ensure consistency when scaling the bucket count?

Copy link
Contributor

@joanagmaia joanagmaia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👏

@epipav epipav merged commit c181a2f into main Dec 10, 2025
19 checks passed
@epipav epipav deleted the feat/tb-partitioned-activityRelations branch December 10, 2025 11:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants