diff --git a/services/libs/tinybird/README.md b/services/libs/tinybird/README.md index ab3fe303d8..b0f8bcc622 100644 --- a/services/libs/tinybird/README.md +++ b/services/libs/tinybird/README.md @@ -1,11 +1,78 @@ -# Journey of Data from CM to Insights -[This image](https://uploads.linear.app/aebec7ad-5649-4758-9bed-061f7228a879/b72d9f55-8f27-4c57-81fe-729807c12ffb/36c116c2-0f88-4735-a932-0c3e6bf8ea45) shows how data flows from CM to Insights. +# Tinybird Documentation -## Activity Preprocessing Pipeline -See LAMBDA_ARCHITECTURE.md for details +## Table of Contents ---- +- [Introduction](#introduction) +- [Journey of Data](#journey-of-data-from-cdp-to-insights) +- [Making Changes to Resources](#making-changes-to-resources) +- [How to Iterate on Data](#how-to-iterate-on-data) +- [Testing Tinybird Pipes Locally](#testing-tinybird-pipes-locally) +- [Creating Backups](#creating-a-backup-datasource-in-tinybird) +- [Glossary](#glossary) + +## Introduction + +This directory contains documentation CDP and Tinybird integration. **Tinybird** is a real-time analytics database built on ClickHouse that powers our Insights platform with fast, scalable queries on community activity data. + +### System Role + +Tinybird sits between our Community Data Platform (CDP) backend and the Insights frontend: + +1. **Data Ingestion**: Receives data from Postgres (via Sequin → Kafka → Kafka Connect) +2. **Processing**: Enriches, filters, and aggregates activity data using different architectures (Bucketing & Lambda) +3. **Serving**: Provides fast API endpoints for the Insights dashboard and other consumers + +## Journey of Data from CDP to Insights + +See [dataflow](./dataflow.md) for a visual diagram showing how data flows from CDP Backend through Tinybird to Insights. + +## Architecture Overview + +We use **two parallel architectures** to process activityRelations data: + +### Lambda Architecture +1) Deduplicates activityRelations without any filtering. Mainly consumed in CDP, and monitoring pipes. Output: `activityRelations_enriched_deduplicated_ds` + +2) Used for ingesting pull request event data and merging with existing events. Output: `pull_requests_analyzed` + +For details see [lambda-architecture.md](./lambda-architecture.md) + +- Filtering: UNFILTERED (includes bots, all activities) +- Used by: Pull requests, CDP pipes, monitoring +- Details: Lambda architecture pattern for deduplication enrichment and pull request processing + + +### Bucketing Architecture +Produces filtered data (10 buckets) for Insights API queries. For details see [bucketing-architecture.md](./bucketing-architecture.md) + +- Output: `activityRelations_deduplicated_cleaned_bucket_*_ds` (10 buckets) +- Filtering: FILTERED (valid members, enabled repos only) +- Used by: Insights API queries +- Details: Hash-based bucketing architecture for parallel processing + + +### Comparison + +The following table compares the two parallel architectures processing activityRelations data: + +| Aspect | Lambda Architecture | Bucketing Architecture | +|--------|---------------------|------------------------| +| **Primary Use Case** | Pull requests, CDP, monitoring, member management | Insights API queries | +| **Output Datasource** | pull_requests_analyzed, activityRelations_enriched_deduplicated_ds | activityRelations_deduplicated_cleaned_bucket_0-9_ds | +| **Data Filtering** | UNFILTERED (includes bots, all repos) | FILTERED (valid members, enabled repos) | +| **Partitioning Strategy** | Single datasource, snapshot-based | 10 parallel buckets, hash-based | +| **Copy Mode** | Append (creates new snapshots) | Replace (hourly full refresh) | +| **Query Pattern** | Filter by max(snapshotId) | Union all buckets or route to specific bucket | +| **TTL** | 6 hours (keeps ~6 snapshots) | No TTL on buckets (replace mode) | +| **Scalability** | Vertical (single large datasource) | Horizontal (add more buckets) | +| **Dependencies** | Single-table triggers work well | Multi-table dependencies (members, repos) | + +**Which activityRelations output to use:** + +- Use Bucketing Architecture output (`activityRelations_deduplicated_cleaned_bucket_*_ds`) for: Insights API, project-specific analytics, filtered queries - since each bucket contains a subset of project data, main use-case is project-specific widgets + +- **Use Lambda Architecture output** (`activityRelations_enriched_deduplicated_ds`) for: CDP operations, monitoring, any use case requiring complete unfiltered data, where we can not use the buckets ## Making changes to resources 1. Install the **tb client** for classic tinybird @@ -90,7 +157,7 @@ GRANT SELECT ON "tableName" to sequin; Switching between old and new datasources can lead to **temporary downtime**, but only for **endpoint pipes that consume raw datasources directly**. **No Downtime** if the endpoint pipe uses a **copy pipe result**: -- You can safely remove the raw datasource after stopping the copy job +- You can safely remove the raw datasource after stopping the copy pipe - The copy pipe result datasource will continue to serve data - New fields will be included in the **next copy run** @@ -270,3 +337,17 @@ tb sql "SELECT count() FROM activities_backup FINAL" - (3) = (4) → same number of logical records after deduplication If both pairs match, the backup is **logically consistent** with the source dataset. + + +## Glossary + +- **CDP (Community Data Platform)**: Customer data operations and management pipelines +- **Tinybird**: Real-time analytics database built on ClickHouse, used for fast query processing +- **Datasource**: A Tinybird table where data is stored (analogous to database tables) +- **Pipe**: A Tinybird SQL query that can be scheduled or materialized +- **MV (Materialized View)**: A pipe that triggers automatically on INSERT to a datasource +- **Copy Pipe**: A scheduled pipe that copies/transforms data from one datasource to another +- **Sequin**: Database replication tool that streams Postgres changes to Kafka +- **Insights**: The frontend analytics interface for community data +- **segmentId**: Unique identifier for a project/community segment +- **snapshotId**: Timestamp identifier used for deduplication and versioning in lambda architecture diff --git a/services/libs/tinybird/bucketing-architecture.md b/services/libs/tinybird/bucketing-architecture.md new file mode 100644 index 0000000000..ddfb78494b --- /dev/null +++ b/services/libs/tinybird/bucketing-architecture.md @@ -0,0 +1,421 @@ +# Bucketing Architecture for ActivityRelations + +## Table of Contents + +- [Overview](#overview) +- [Why Bucketing?](#why-bucketing) +- [Complete Data Flow](#complete-data-flow) +- [Query Layer](#query-layer) +- [Hash-Based Bucketing Strategy](#hash-based-bucketing-strategy) +- [Bootstrap/Initial Load](#bootstrapinitial-load) +- [Query Patterns](#query-patterns) +- [Monitoring & Maintenance](#monitoring--maintenance) +- [Adding New Buckets](#adding-new-buckets) + +**Related Documentation:** +- [Main README](./README.md) - Overview and getting started +- [Lambda Architecture](./lambda-architecture.md) - Parallel architecture for unfiltered data +- [Data Flow Diagram](./dataflow) - Visual system overview + +--- + +## Overview + +The **bucketing architecture** is a distributed data processing pattern implemented for the activityRelations pipeline in Tinybird. This architecture partitions incoming data into 10 parallel buckets using consistent hash-based routing, enabling parallel copy pipe processing and requests routing to smaller buckets instead of the full data + +## Why Bucketing? + +- **Parallel Processing**: 10 independent buckets process data concurrently +- **Better Resource Utilization**: Copy pipes can run in parallel, and parallelization can be increased by adding more copy workers +- **Scaling to 1B+ activities**: In the future we have the option to have more buckets, spreading data even more + + +## But What's Wrong with Lambda Architecture? + +The bucketing architecture differs from the lambda architecture used for other pipelines (like pull requests). The problem with using lambda architecture for activityRelations processing is that, the result dataset depends on more than one table: + +- activityRelations for changing activity data +- members for marking members as bot or non-bot +- enabling/disabling repositories in segmentRepositories + +These operations will change the result dataset and with each change new rows can be added or removed from the resulting set. However lambda architecture works on single-table insert triggers (the initial MV) so we can't listen to all these events at the same time. + +That's the main reason we can't get away with append-only copies and creating snapshots using new data triggered by materialized views. Instead we need replace copy operations hourly, where we check again if members are bots or not, repositories are enabled or disabled etc. + + +## Complete Data Flow + +### [1] Source: activityRelations Datasource + +The pipeline starts with the `activityRelations.datasource`, which receives data replicated from PostgreSQL. + +**Upstream Data Flow:** +``` +PostgreSQL activityRelations table + ↓ (logical replication slot) +Sequin (replication processor) + ↓ (publishes row changes) +Kafka Topic: activityRelations + ↓ (HTTP sink connector) +Kafka Connect + ↓ (HTTP POST to Events API) +Tinybird Events API + ↓ (ingests JSON) +activityRelations.datasource +``` + +**Datasource Configuration:** +- **File**: `datasources/activityRelations.datasource` +- **Engine**: ReplacingMergeTree +- **Version Column**: ENGINE_VER "updatedAt" +- **Partitioning**: toYear(createdAt) +- **Sorting Key**: segmentId, timestamp, type, platform, channel, sourceId + +### [2] Bucketing Layer: 10 Parallel Materialized Views + +Data flows from the source datasource into 10 parallel materialized views that split records by segment. + +**Materialized View Pipes:** +- `pipes/activityRelations_bucket_MV_0.pipe` through `activityRelations_bucket_MV_9.pipe` + +**Bucketing Logic:** +Each MV filters data using a hash-based partitioning strategy: +```sql +SELECT * FROM activityRelations +WHERE cityHash64(segmentId) % 10 = {bucket_number} +``` + +**Characteristics:** +- **Type**: MATERIALIZED (triggers immediately on INSERT to activityRelations) +- **Purpose**: Distribute incoming data into 10 partitions +- **Distribution**: Each bucket receives ~10% of total data +- **Consistency**: Same segmentId always routes to same bucket (deterministic hashing) + +**Note on bucket sizes:** It's normal for some buckets to contain slightly more data than others, since different segments generate different volumes of activities. The hash function ensures distribution on segments, because project specific insights pages will always filter by projects (segmentIds). + +### [3] Raw Bucket Datasources + +Each materialized view writes to its corresponding raw bucket datasource. + +**Datasources:** +- `datasources/activityRelations_bucket_MV_ds_0.datasource` through `activityRelations_bucket_MV_ds_9.datasource` + +**Configuration:** +- **Engine**: ReplacingMergeTree +- **Version Column**: ENGINE_VER "updatedAt" +- **Partitioning**: toYear(createdAt) +- **Sorting Key**: segmentId, timestamp, type, platform, channel, sourceId + +These datasources hold the raw, unenriched data for each bucket. + +### [4] Enrichment + Cleaning Layer: 10 Parallel Copy Pipes + +Hourly scheduled copy pipes enrich and filter the data in each bucket. + +**Copy Pipes:** +- `pipes/activityRelations_bucket_clean_enrich_copy_pipe_0.pipe` through `activityRelations_bucket_clean_enrich_copy_pipe_9.pipe` + +**Schedule (Staggered):** +The pipes run on staggered schedules to distribute load: +- **Pipes 0-1**: `10 * * * *` (every hour at minute :10) +- **Pipes 2-3**: `14 * * * *` (every hour at minute :14) +- **Pipes 4-5**: `18 * * * *` (every hour at minute :18) +- **Pipes 6-7**: `22 * * * *` (every hour at minute :22) +- **Pipes 8-9**: `26 * * * *` (every hour at minute :26) + +**Configuration:** +- **COPY_MODE**: replace (overwrites entire bucket each run) +- **COPY_SCHEDULE**: Hourly with 4-minute stagger between groups + +**Operations Performed:** + +These copy pipes perform three distinct operations that transform raw bucket data into production-ready analytics data: + +1. **Enrichment** (Adding computed fields and metadata): + - Calculates `gitChangedLines` (gitInsertions + gitDeletions) + - Categorizes into `gitChangedLinesBucket` (1-9, 10-59, 60-99, 100-499, 500+) + - Adds `organizationCountryCode` via country mapping + - Adds `organizationName` from organizations table + - Generates `snapshotId` (toStartOfInterval(now(), INTERVAL 1 hour)) + +2. **Cleaning** (Filtering out invalid/unwanted data): + - Filters by valid members: `memberId IN (SELECT id FROM members_sorted)` (removes bots) + - Filters by valid repositories for git platforms (removes disabled repos) + - Filters by valid segments: `segmentId IN (SELECT segmentId FROM segmentRepositories WHERE excluded = false)` + - This is why bucketing output is "cleaned" - invalid data is removed + +3. **Deduplication** (Ensuring data consistency): + - Uses FINAL modifier on source to deduplicate by ReplacingMergeTree version + - Joins with organizations table using FINAL + - Ensures only the latest version of each activity is included + +### [5] Cleaned Bucket Datasources + +The enrichment copy pipes write to cleaned bucket datasources. + +**Datasources:** +- `datasources/activityRelations_deduplicated_cleaned_bucket_0_ds.datasource` through `activityRelations_deduplicated_cleaned_bucket_9_ds.datasource` + +**Configuration:** +- **Engine**: MergeTree (not ReplacingMergeTree - data is already deduplicated) +- **Partitioning**: toYear(timestamp) +- **Sorting Key**: segmentId, timestamp, type, platform, memberId, organizationId + +These datasources serve as the final, queryable data layer for each bucket. + +### [6] Query Layer + +Two query patterns are provided for accessing bucketed data: + +#### Union Pipe (Cross-Bucket Queries) + +**Pipe**: `pipes/activityRelations_deduplicated_cleaned_bucket_union.pipe` + +**Purpose**: Queries across all buckets when segment is unknown or query spans multiple segments + +**SQL Pattern**: +```sql +SELECT * FROM activityRelations_deduplicated_cleaned_bucket_0_ds +UNION ALL +SELECT * FROM activityRelations_deduplicated_cleaned_bucket_1_ds +UNION ALL +... +UNION ALL +SELECT * FROM activityRelations_deduplicated_cleaned_bucket_9_ds +``` + +**Use Cases**: +- Multi-segment analytics +- Global aggregations +- Queries without segment filter + +#### Routing Pipe (Single-Bucket Queries) + +**Pipe**: `pipes/activityRelations_bucket_routing.pipe` + +**Purpose**: Routes query to specific bucket for faster single-segment queries + +**Parameters**: +- `bucketId` (Int8): The bucket number (0-9) to query + +**SQL Pattern**: +```sql +SELECT * FROM activityRelations_deduplicated_cleaned_bucket_{{ bucketId }}_ds +``` + +**Use Cases**: +- Single segment queries +- Queries where segment is known +- Performance-critical lookups + +**BucketId Resolution**: + +The bucketId can be obtained in several ways: + +1. **Using the project_buckets pipe**: Query the `project_buckets` pipe with a segmentId to get its bucket assignment: + ```sql + SELECT bucketId FROM project_buckets WHERE segmentId = 'your-segment-id' + ``` + +3. **Insights**: When using the Insights API, bucketId is automatically injected when a `project` parameter is present. No additional bucketId parameter is required - the API handles routing transparently. + +## Hash-Based Bucketing Strategy + +### Algorithm + +The bucketing uses ClickHouse's `cityHash64` function with modulo 10: + +```sql +cityHash64(segmentId) % 10 = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} +``` + +## Bootstrap/Initial Load + +For initial deployment or bucket recreation, snapshot pipes populate buckets from existing data. + +### Snapshot Pipes + +**Pipes**: `pipes/activityRelations_bucket_MV_snapshot_0.pipe` through `activityRelations_bucket_MV_snapshot_9.pipe` + +**Configuration**: +- **Type**: COPY +- **Schedule**: @on-demand (manual execution) +- **COPY_MODE**: append +- **Source**: activityRelations datasource +- **Target**: activityRelations_bucket_MV_ds_0 through ds_9 + +**SQL Pattern**: +```sql +SELECT * FROM activityRelations +WHERE cityHash64(segmentId) % 10 = {bucket_number} +``` + +### Bootstrap Procedure + +1. **Prepare**: Ensure all bucket datasources exist +2. **Execute Snapshots**: Run all 10 snapshot pipes manually +3. **Verify**: Check each bucket has ~10% of total records +4. **Enable Enrichment**: Allow scheduled copy pipes to run +5. **Monitor**: Watch for snapshotId updates in cleaned buckets + +## Query Patterns + +### Single Segment Query (Use Routing) + +```sql +-- Calculate bucket +SET bucketId = cityHash64('segment-123') % 10; + +-- Query specific bucket +SELECT * FROM activityRelations_bucket_routing +WHERE segmentId = 'segment-123' + AND timestamp >= '2024-01-01' +``` + +**Performance**: ✓ Fast (scans only 1 bucket) + +### Multi-Segment Query (Use Union) + +```sql +SELECT segmentId, COUNT(*) as activity_count +FROM activityRelations_deduplicated_cleaned_bucket_union +WHERE timestamp >= '2024-01-01' + AND segmentId IN ('segment-1', 'segment-2', 'segment-3') +GROUP BY segmentId +``` + +**Performance**: ~ Moderate (scans up to 3 buckets, filtered by WHERE) + +### Global Aggregation (Use Union) + +```sql +SELECT DATE(timestamp) as date, COUNT(*) as total_activities +FROM activityRelations_deduplicated_cleaned_bucket_union +WHERE timestamp >= '2024-01-01' +GROUP BY date +``` + +**Performance**: ✗ Slower but in ok range (scans all 10 buckets) + +## Monitoring & Maintenance + +### Common Maintenance Tasks + +**Manually Trigger Enrichment**: +If a bucket is stale, manually trigger its copy pipe: +```bash +tb pipe copy run activityRelations_bucket_clean_enrich_copy_pipe_3 --wait +``` + +**Rebuild Single Bucket**: +1. Truncate cleaned bucket: `TRUNCATE TABLE activityRelations_deduplicated_cleaned_bucket_3_ds` +2. Run enrichment pipe manually + +**Rebuild All Buckets**: +1. Run all snapshot pipes to repopulate raw buckets +2. Run all enrichment pipes to populate cleaned buckets + +### Adding New Buckets +> ****Cost Considerations**** +> - More buckets = more concurrent copy pipes = higher compute costs +> - Balance between parallelism and resource utilization + +As data volume grows, you can scale from 10 buckets to 20, 50, or 100 buckets by following these steps: + + + + +**1. Plan the New Bucket Count** +- Choose a new modulo divisor (e.g., 20, 50, or 100) +- Ensure it's a multiple of 10 for easier migration (10 → 20 → 40, etc.) +- Consider resource capacity: more buckets = more concurrent copy pipes + +**2. Create New Bucket Resources** + +For each new bucket number (e.g., 10-19 for 20-bucket system): + +a. **Create MV pipe**: `activityRelations_bucket_MV_{N}.pipe` + ```sql + SELECT * FROM activityRelations + WHERE cityHash64(segmentId) % 20 = {N} + ``` + +b. **Create raw bucket datasource**: `activityRelations_bucket_MV_ds_{N}.datasource` + - Use same schema as existing buckets (0-9) + - ReplacingMergeTree with ENGINE_VER "updatedAt" + +c. **Create enrichment copy pipe**: `activityRelations_bucket_clean_enrich_copy_pipe_{N}.pipe` + - Copy structure from existing enrichment pipes + - Assign staggered schedule (continue the pattern) + - Update to reference new bucket numbers + +d. **Create cleaned bucket datasource**: `activityRelations_deduplicated_cleaned_bucket_{N}_ds.datasource` + - Use same schema as existing cleaned buckets + - MergeTree engine with same sorting key + +**3. Update Existing Buckets** + +Modify all existing bucket MVs (0-9) to use the new modulo: +```sql +-- Old: WHERE cityHash64(segmentId) % 10 = 0 +-- New: WHERE cityHash64(segmentId) % 20 = 0 +``` + +**4. Update Query Layer** + +a. **Update union pipe**: Add new buckets to the UNION ALL chain: + ```sql + SELECT * FROM activityRelations_deduplicated_cleaned_bucket_0_ds + UNION ALL + ... + UNION ALL + SELECT * FROM activityRelations_deduplicated_cleaned_bucket_19_ds + ``` + +b. **Update routing pipe**: No changes needed (works with any bucketId) + +c. **Update project_buckets pipe**: Update modulo calculation: + ```sql + SELECT segmentId, cityHash64(segmentId) % 20 as bucketId + ``` + +**5. Bootstrap New Buckets** + +Create and run snapshot pipes for new buckets (10-19): +```sql +SELECT * FROM activityRelations +WHERE cityHash64(segmentId) % 20 = {N} +``` + +**6. Migration Strategy** + +**Option A: Clean cutover (requires downtime)** +1. Stop data ingestion temporarily +2. Deploy all new bucket configurations +3. Run all snapshot pipes (0-19) to repopulate +4. Run all enrichment pipes +5. Resume data ingestion + +**Option B: Gradual migration (no downtime)** +1. Deploy new buckets (10-19) alongside existing ones +2. Update MVs to use new modulo (this redistributes data) +3. Let MVs accumulate new data naturally +4. Run snapshot pipes for backfill +5. Verify data completeness before removing old buckets + +**7. Verification** + +After migration, verify bucket distribution: +```sql +SELECT + 0 as bucket, COUNT(*) as count FROM activityRelations_bucket_MV_ds_0 +UNION ALL +SELECT 1, COUNT(*) FROM activityRelations_bucket_MV_ds_1 +-- ... for all buckets +``` + +Each bucket should have approximately `total_rows / bucket_count` records. + +**8. Update Documentation** + +Update the documentation to reflect the new number of buckets. \ No newline at end of file diff --git a/services/libs/tinybird/dataflow.md b/services/libs/tinybird/dataflow.md new file mode 100644 index 0000000000..c592f290e6 --- /dev/null +++ b/services/libs/tinybird/dataflow.md @@ -0,0 +1,46 @@ +```mermaid +flowchart TD + %% CM Backend section + subgraph CM[CM backend] + connectors["Connectors
(GitHub, Git, Nango, etc.)"] + Postgres[(Postgres)] + sequin[Sequin] + + %% CM internal connections + connectors -->|People and Org data| Postgres + connectors -->|Activity relations data| Postgres + Postgres -->|Real-time| sequin + Kafka1[Kafka with Schema Registry for data contract] + sequin -->|"Real-time"| Kafka1 + end + + %% Tinybird section + subgraph Tinybird[Tinybird] + DS["Datasource
(ReplacingMergeTree)"] + Pipe[Pipe] + API[API
with caching] + + DS --> Pipe --> API + end + + %% Snowflake section + subgraph Snowflake[Snowflake] + Dedup[Deduplication] + Bronze[Bronze] + Silver[Silver] + Gold[Gold / Platinum] + + Dedup --> Bronze --> Silver --> Gold + end + + %% External connections + Kafka1 -->|"People/Org data
Activity relations
Real-time"| DS + Kafka1 -->|"Snowflake Kafka connector"| Dedup + Pipe -->|"Kafka Sink Pipes" | Dedup + connectors -->|"Activities immutable data
via Tinybird Events API"| DS + Gold --> Other[Other products] + + %% Frontend flow + API --> SSR[SSR Nuxt App] + SSR --> CDN["CDN (Cloudflare)"] + CDN --> Browser diff --git a/services/libs/tinybird/datasources/activities_backup_consistency_audit.datasource b/services/libs/tinybird/datasources/activities_backup_consistency_audit.datasource new file mode 100644 index 0000000000..e4a60286a3 --- /dev/null +++ b/services/libs/tinybird/datasources/activities_backup_consistency_audit.datasource @@ -0,0 +1,17 @@ +DESCRIPTION > + Stores periodic consistency metrics between `activities` and `activities_backup`. + +SCHEMA > + `computedAt` DateTime, + `uniq_ids_src` UInt64, + `uniq_ids_bkp` UInt64, + `count_final_src` UInt64, + `count_final_bkp` UInt64, + `diff_uniq` Int64, + `diff_final` Int64, + `pct_diff_uniq` Float64, + `pct_diff_final` Float64 + +ENGINE MergeTree +ENGINE_PARTITION_KEY toYear(computedAt) +ENGINE_SORTING_KEY computedAt diff --git a/services/libs/tinybird/datasources/activityRelations_bucket_MV_ds_0.datasource b/services/libs/tinybird/datasources/activityRelations_bucket_MV_ds_0.datasource new file mode 100644 index 0000000000..e78e4b46b3 --- /dev/null +++ b/services/libs/tinybird/datasources/activityRelations_bucket_MV_ds_0.datasource @@ -0,0 +1,29 @@ +SCHEMA > + `activityId` String, + `conversationId` String, + `createdAt` DateTime64(3), + `updatedAt` DateTime64(3), + `memberId` String, + `objectMemberId` String, + `objectMemberUsername` String, + `organizationId` String, + `parentId` String, + `platform` LowCardinality(String), + `segmentId` String, + `username` String, + `sourceId` String, + `type` LowCardinality(String), + `timestamp` DateTime64(3), + `sourceParentId` String, + `channel` String, + `sentimentScore` Int8, + `gitInsertions` UInt32, + `gitDeletions` UInt32, + `score` Int8, + `isContribution` UInt8, + `pullRequestReviewState` LowCardinality(String) + +ENGINE ReplacingMergeTree +ENGINE_PARTITION_KEY toYear(createdAt) +ENGINE_SORTING_KEY segmentId, timestamp, type, platform, channel, sourceId +ENGINE_VER updatedAt diff --git a/services/libs/tinybird/datasources/activityRelations_bucket_MV_ds_1.datasource b/services/libs/tinybird/datasources/activityRelations_bucket_MV_ds_1.datasource new file mode 100644 index 0000000000..e78e4b46b3 --- /dev/null +++ b/services/libs/tinybird/datasources/activityRelations_bucket_MV_ds_1.datasource @@ -0,0 +1,29 @@ +SCHEMA > + `activityId` String, + `conversationId` String, + `createdAt` DateTime64(3), + `updatedAt` DateTime64(3), + `memberId` String, + `objectMemberId` String, + `objectMemberUsername` String, + `organizationId` String, + `parentId` String, + `platform` LowCardinality(String), + `segmentId` String, + `username` String, + `sourceId` String, + `type` LowCardinality(String), + `timestamp` DateTime64(3), + `sourceParentId` String, + `channel` String, + `sentimentScore` Int8, + `gitInsertions` UInt32, + `gitDeletions` UInt32, + `score` Int8, + `isContribution` UInt8, + `pullRequestReviewState` LowCardinality(String) + +ENGINE ReplacingMergeTree +ENGINE_PARTITION_KEY toYear(createdAt) +ENGINE_SORTING_KEY segmentId, timestamp, type, platform, channel, sourceId +ENGINE_VER updatedAt diff --git a/services/libs/tinybird/datasources/activityRelations_bucket_MV_ds_2.datasource b/services/libs/tinybird/datasources/activityRelations_bucket_MV_ds_2.datasource new file mode 100644 index 0000000000..e78e4b46b3 --- /dev/null +++ b/services/libs/tinybird/datasources/activityRelations_bucket_MV_ds_2.datasource @@ -0,0 +1,29 @@ +SCHEMA > + `activityId` String, + `conversationId` String, + `createdAt` DateTime64(3), + `updatedAt` DateTime64(3), + `memberId` String, + `objectMemberId` String, + `objectMemberUsername` String, + `organizationId` String, + `parentId` String, + `platform` LowCardinality(String), + `segmentId` String, + `username` String, + `sourceId` String, + `type` LowCardinality(String), + `timestamp` DateTime64(3), + `sourceParentId` String, + `channel` String, + `sentimentScore` Int8, + `gitInsertions` UInt32, + `gitDeletions` UInt32, + `score` Int8, + `isContribution` UInt8, + `pullRequestReviewState` LowCardinality(String) + +ENGINE ReplacingMergeTree +ENGINE_PARTITION_KEY toYear(createdAt) +ENGINE_SORTING_KEY segmentId, timestamp, type, platform, channel, sourceId +ENGINE_VER updatedAt diff --git a/services/libs/tinybird/datasources/activityRelations_bucket_MV_ds_3.datasource b/services/libs/tinybird/datasources/activityRelations_bucket_MV_ds_3.datasource new file mode 100644 index 0000000000..e78e4b46b3 --- /dev/null +++ b/services/libs/tinybird/datasources/activityRelations_bucket_MV_ds_3.datasource @@ -0,0 +1,29 @@ +SCHEMA > + `activityId` String, + `conversationId` String, + `createdAt` DateTime64(3), + `updatedAt` DateTime64(3), + `memberId` String, + `objectMemberId` String, + `objectMemberUsername` String, + `organizationId` String, + `parentId` String, + `platform` LowCardinality(String), + `segmentId` String, + `username` String, + `sourceId` String, + `type` LowCardinality(String), + `timestamp` DateTime64(3), + `sourceParentId` String, + `channel` String, + `sentimentScore` Int8, + `gitInsertions` UInt32, + `gitDeletions` UInt32, + `score` Int8, + `isContribution` UInt8, + `pullRequestReviewState` LowCardinality(String) + +ENGINE ReplacingMergeTree +ENGINE_PARTITION_KEY toYear(createdAt) +ENGINE_SORTING_KEY segmentId, timestamp, type, platform, channel, sourceId +ENGINE_VER updatedAt diff --git a/services/libs/tinybird/datasources/activityRelations_bucket_MV_ds_4.datasource b/services/libs/tinybird/datasources/activityRelations_bucket_MV_ds_4.datasource new file mode 100644 index 0000000000..e78e4b46b3 --- /dev/null +++ b/services/libs/tinybird/datasources/activityRelations_bucket_MV_ds_4.datasource @@ -0,0 +1,29 @@ +SCHEMA > + `activityId` String, + `conversationId` String, + `createdAt` DateTime64(3), + `updatedAt` DateTime64(3), + `memberId` String, + `objectMemberId` String, + `objectMemberUsername` String, + `organizationId` String, + `parentId` String, + `platform` LowCardinality(String), + `segmentId` String, + `username` String, + `sourceId` String, + `type` LowCardinality(String), + `timestamp` DateTime64(3), + `sourceParentId` String, + `channel` String, + `sentimentScore` Int8, + `gitInsertions` UInt32, + `gitDeletions` UInt32, + `score` Int8, + `isContribution` UInt8, + `pullRequestReviewState` LowCardinality(String) + +ENGINE ReplacingMergeTree +ENGINE_PARTITION_KEY toYear(createdAt) +ENGINE_SORTING_KEY segmentId, timestamp, type, platform, channel, sourceId +ENGINE_VER updatedAt diff --git a/services/libs/tinybird/datasources/activityRelations_bucket_MV_ds_5.datasource b/services/libs/tinybird/datasources/activityRelations_bucket_MV_ds_5.datasource new file mode 100644 index 0000000000..e78e4b46b3 --- /dev/null +++ b/services/libs/tinybird/datasources/activityRelations_bucket_MV_ds_5.datasource @@ -0,0 +1,29 @@ +SCHEMA > + `activityId` String, + `conversationId` String, + `createdAt` DateTime64(3), + `updatedAt` DateTime64(3), + `memberId` String, + `objectMemberId` String, + `objectMemberUsername` String, + `organizationId` String, + `parentId` String, + `platform` LowCardinality(String), + `segmentId` String, + `username` String, + `sourceId` String, + `type` LowCardinality(String), + `timestamp` DateTime64(3), + `sourceParentId` String, + `channel` String, + `sentimentScore` Int8, + `gitInsertions` UInt32, + `gitDeletions` UInt32, + `score` Int8, + `isContribution` UInt8, + `pullRequestReviewState` LowCardinality(String) + +ENGINE ReplacingMergeTree +ENGINE_PARTITION_KEY toYear(createdAt) +ENGINE_SORTING_KEY segmentId, timestamp, type, platform, channel, sourceId +ENGINE_VER updatedAt diff --git a/services/libs/tinybird/datasources/activityRelations_bucket_MV_ds_6.datasource b/services/libs/tinybird/datasources/activityRelations_bucket_MV_ds_6.datasource new file mode 100644 index 0000000000..e78e4b46b3 --- /dev/null +++ b/services/libs/tinybird/datasources/activityRelations_bucket_MV_ds_6.datasource @@ -0,0 +1,29 @@ +SCHEMA > + `activityId` String, + `conversationId` String, + `createdAt` DateTime64(3), + `updatedAt` DateTime64(3), + `memberId` String, + `objectMemberId` String, + `objectMemberUsername` String, + `organizationId` String, + `parentId` String, + `platform` LowCardinality(String), + `segmentId` String, + `username` String, + `sourceId` String, + `type` LowCardinality(String), + `timestamp` DateTime64(3), + `sourceParentId` String, + `channel` String, + `sentimentScore` Int8, + `gitInsertions` UInt32, + `gitDeletions` UInt32, + `score` Int8, + `isContribution` UInt8, + `pullRequestReviewState` LowCardinality(String) + +ENGINE ReplacingMergeTree +ENGINE_PARTITION_KEY toYear(createdAt) +ENGINE_SORTING_KEY segmentId, timestamp, type, platform, channel, sourceId +ENGINE_VER updatedAt diff --git a/services/libs/tinybird/datasources/activityRelations_bucket_MV_ds_7.datasource b/services/libs/tinybird/datasources/activityRelations_bucket_MV_ds_7.datasource new file mode 100644 index 0000000000..e78e4b46b3 --- /dev/null +++ b/services/libs/tinybird/datasources/activityRelations_bucket_MV_ds_7.datasource @@ -0,0 +1,29 @@ +SCHEMA > + `activityId` String, + `conversationId` String, + `createdAt` DateTime64(3), + `updatedAt` DateTime64(3), + `memberId` String, + `objectMemberId` String, + `objectMemberUsername` String, + `organizationId` String, + `parentId` String, + `platform` LowCardinality(String), + `segmentId` String, + `username` String, + `sourceId` String, + `type` LowCardinality(String), + `timestamp` DateTime64(3), + `sourceParentId` String, + `channel` String, + `sentimentScore` Int8, + `gitInsertions` UInt32, + `gitDeletions` UInt32, + `score` Int8, + `isContribution` UInt8, + `pullRequestReviewState` LowCardinality(String) + +ENGINE ReplacingMergeTree +ENGINE_PARTITION_KEY toYear(createdAt) +ENGINE_SORTING_KEY segmentId, timestamp, type, platform, channel, sourceId +ENGINE_VER updatedAt diff --git a/services/libs/tinybird/datasources/activityRelations_bucket_MV_ds_8.datasource b/services/libs/tinybird/datasources/activityRelations_bucket_MV_ds_8.datasource new file mode 100644 index 0000000000..e78e4b46b3 --- /dev/null +++ b/services/libs/tinybird/datasources/activityRelations_bucket_MV_ds_8.datasource @@ -0,0 +1,29 @@ +SCHEMA > + `activityId` String, + `conversationId` String, + `createdAt` DateTime64(3), + `updatedAt` DateTime64(3), + `memberId` String, + `objectMemberId` String, + `objectMemberUsername` String, + `organizationId` String, + `parentId` String, + `platform` LowCardinality(String), + `segmentId` String, + `username` String, + `sourceId` String, + `type` LowCardinality(String), + `timestamp` DateTime64(3), + `sourceParentId` String, + `channel` String, + `sentimentScore` Int8, + `gitInsertions` UInt32, + `gitDeletions` UInt32, + `score` Int8, + `isContribution` UInt8, + `pullRequestReviewState` LowCardinality(String) + +ENGINE ReplacingMergeTree +ENGINE_PARTITION_KEY toYear(createdAt) +ENGINE_SORTING_KEY segmentId, timestamp, type, platform, channel, sourceId +ENGINE_VER updatedAt diff --git a/services/libs/tinybird/datasources/activityRelations_bucket_MV_ds_9.datasource b/services/libs/tinybird/datasources/activityRelations_bucket_MV_ds_9.datasource new file mode 100644 index 0000000000..e78e4b46b3 --- /dev/null +++ b/services/libs/tinybird/datasources/activityRelations_bucket_MV_ds_9.datasource @@ -0,0 +1,29 @@ +SCHEMA > + `activityId` String, + `conversationId` String, + `createdAt` DateTime64(3), + `updatedAt` DateTime64(3), + `memberId` String, + `objectMemberId` String, + `objectMemberUsername` String, + `organizationId` String, + `parentId` String, + `platform` LowCardinality(String), + `segmentId` String, + `username` String, + `sourceId` String, + `type` LowCardinality(String), + `timestamp` DateTime64(3), + `sourceParentId` String, + `channel` String, + `sentimentScore` Int8, + `gitInsertions` UInt32, + `gitDeletions` UInt32, + `score` Int8, + `isContribution` UInt8, + `pullRequestReviewState` LowCardinality(String) + +ENGINE ReplacingMergeTree +ENGINE_PARTITION_KEY toYear(createdAt) +ENGINE_SORTING_KEY segmentId, timestamp, type, platform, channel, sourceId +ENGINE_VER updatedAt diff --git a/services/libs/tinybird/datasources/activityRelations_deduplicated_cleaned_bucket_0_ds.datasource b/services/libs/tinybird/datasources/activityRelations_deduplicated_cleaned_bucket_0_ds.datasource new file mode 100644 index 0000000000..cc8a56e482 --- /dev/null +++ b/services/libs/tinybird/datasources/activityRelations_deduplicated_cleaned_bucket_0_ds.datasource @@ -0,0 +1,32 @@ +SCHEMA > + `activityId` String, + `conversationId` String, + `createdAt` DateTime64(3), + `updatedAt` DateTime64(3), + `memberId` String, + `objectMemberId` String, + `objectMemberUsername` String, + `organizationId` String, + `parentId` String, + `platform` LowCardinality(String), + `segmentId` String, + `username` String, + `sourceId` String, + `type` LowCardinality(String), + `timestamp` DateTime64(3), + `sourceParentId` String, + `channel` String, + `sentimentScore` Int8, + `gitInsertions` UInt32, + `gitDeletions` UInt32, + `score` Int8, + `isContribution` UInt8, + `pullRequestReviewState` LowCardinality(String), + `gitChangedLines` UInt64, + `gitChangedLinesBucket` String, + `organizationCountryCode` LowCardinality(String), + `organizationName` String, + `snapshotId` DateTime + +ENGINE MergeTree +ENGINE_SORTING_KEY segmentId, timestamp, type, platform, memberId, organizationId diff --git a/services/libs/tinybird/datasources/activityRelations_deduplicated_cleaned_bucket_1_ds.datasource b/services/libs/tinybird/datasources/activityRelations_deduplicated_cleaned_bucket_1_ds.datasource new file mode 100644 index 0000000000..cc8a56e482 --- /dev/null +++ b/services/libs/tinybird/datasources/activityRelations_deduplicated_cleaned_bucket_1_ds.datasource @@ -0,0 +1,32 @@ +SCHEMA > + `activityId` String, + `conversationId` String, + `createdAt` DateTime64(3), + `updatedAt` DateTime64(3), + `memberId` String, + `objectMemberId` String, + `objectMemberUsername` String, + `organizationId` String, + `parentId` String, + `platform` LowCardinality(String), + `segmentId` String, + `username` String, + `sourceId` String, + `type` LowCardinality(String), + `timestamp` DateTime64(3), + `sourceParentId` String, + `channel` String, + `sentimentScore` Int8, + `gitInsertions` UInt32, + `gitDeletions` UInt32, + `score` Int8, + `isContribution` UInt8, + `pullRequestReviewState` LowCardinality(String), + `gitChangedLines` UInt64, + `gitChangedLinesBucket` String, + `organizationCountryCode` LowCardinality(String), + `organizationName` String, + `snapshotId` DateTime + +ENGINE MergeTree +ENGINE_SORTING_KEY segmentId, timestamp, type, platform, memberId, organizationId diff --git a/services/libs/tinybird/datasources/activityRelations_deduplicated_cleaned_bucket_2_ds.datasource b/services/libs/tinybird/datasources/activityRelations_deduplicated_cleaned_bucket_2_ds.datasource new file mode 100644 index 0000000000..cc8a56e482 --- /dev/null +++ b/services/libs/tinybird/datasources/activityRelations_deduplicated_cleaned_bucket_2_ds.datasource @@ -0,0 +1,32 @@ +SCHEMA > + `activityId` String, + `conversationId` String, + `createdAt` DateTime64(3), + `updatedAt` DateTime64(3), + `memberId` String, + `objectMemberId` String, + `objectMemberUsername` String, + `organizationId` String, + `parentId` String, + `platform` LowCardinality(String), + `segmentId` String, + `username` String, + `sourceId` String, + `type` LowCardinality(String), + `timestamp` DateTime64(3), + `sourceParentId` String, + `channel` String, + `sentimentScore` Int8, + `gitInsertions` UInt32, + `gitDeletions` UInt32, + `score` Int8, + `isContribution` UInt8, + `pullRequestReviewState` LowCardinality(String), + `gitChangedLines` UInt64, + `gitChangedLinesBucket` String, + `organizationCountryCode` LowCardinality(String), + `organizationName` String, + `snapshotId` DateTime + +ENGINE MergeTree +ENGINE_SORTING_KEY segmentId, timestamp, type, platform, memberId, organizationId diff --git a/services/libs/tinybird/datasources/activityRelations_deduplicated_cleaned_bucket_3_ds.datasource b/services/libs/tinybird/datasources/activityRelations_deduplicated_cleaned_bucket_3_ds.datasource new file mode 100644 index 0000000000..cc8a56e482 --- /dev/null +++ b/services/libs/tinybird/datasources/activityRelations_deduplicated_cleaned_bucket_3_ds.datasource @@ -0,0 +1,32 @@ +SCHEMA > + `activityId` String, + `conversationId` String, + `createdAt` DateTime64(3), + `updatedAt` DateTime64(3), + `memberId` String, + `objectMemberId` String, + `objectMemberUsername` String, + `organizationId` String, + `parentId` String, + `platform` LowCardinality(String), + `segmentId` String, + `username` String, + `sourceId` String, + `type` LowCardinality(String), + `timestamp` DateTime64(3), + `sourceParentId` String, + `channel` String, + `sentimentScore` Int8, + `gitInsertions` UInt32, + `gitDeletions` UInt32, + `score` Int8, + `isContribution` UInt8, + `pullRequestReviewState` LowCardinality(String), + `gitChangedLines` UInt64, + `gitChangedLinesBucket` String, + `organizationCountryCode` LowCardinality(String), + `organizationName` String, + `snapshotId` DateTime + +ENGINE MergeTree +ENGINE_SORTING_KEY segmentId, timestamp, type, platform, memberId, organizationId diff --git a/services/libs/tinybird/datasources/activityRelations_deduplicated_cleaned_bucket_4_ds.datasource b/services/libs/tinybird/datasources/activityRelations_deduplicated_cleaned_bucket_4_ds.datasource new file mode 100644 index 0000000000..cc8a56e482 --- /dev/null +++ b/services/libs/tinybird/datasources/activityRelations_deduplicated_cleaned_bucket_4_ds.datasource @@ -0,0 +1,32 @@ +SCHEMA > + `activityId` String, + `conversationId` String, + `createdAt` DateTime64(3), + `updatedAt` DateTime64(3), + `memberId` String, + `objectMemberId` String, + `objectMemberUsername` String, + `organizationId` String, + `parentId` String, + `platform` LowCardinality(String), + `segmentId` String, + `username` String, + `sourceId` String, + `type` LowCardinality(String), + `timestamp` DateTime64(3), + `sourceParentId` String, + `channel` String, + `sentimentScore` Int8, + `gitInsertions` UInt32, + `gitDeletions` UInt32, + `score` Int8, + `isContribution` UInt8, + `pullRequestReviewState` LowCardinality(String), + `gitChangedLines` UInt64, + `gitChangedLinesBucket` String, + `organizationCountryCode` LowCardinality(String), + `organizationName` String, + `snapshotId` DateTime + +ENGINE MergeTree +ENGINE_SORTING_KEY segmentId, timestamp, type, platform, memberId, organizationId diff --git a/services/libs/tinybird/datasources/activityRelations_deduplicated_cleaned_bucket_5_ds.datasource b/services/libs/tinybird/datasources/activityRelations_deduplicated_cleaned_bucket_5_ds.datasource new file mode 100644 index 0000000000..cc8a56e482 --- /dev/null +++ b/services/libs/tinybird/datasources/activityRelations_deduplicated_cleaned_bucket_5_ds.datasource @@ -0,0 +1,32 @@ +SCHEMA > + `activityId` String, + `conversationId` String, + `createdAt` DateTime64(3), + `updatedAt` DateTime64(3), + `memberId` String, + `objectMemberId` String, + `objectMemberUsername` String, + `organizationId` String, + `parentId` String, + `platform` LowCardinality(String), + `segmentId` String, + `username` String, + `sourceId` String, + `type` LowCardinality(String), + `timestamp` DateTime64(3), + `sourceParentId` String, + `channel` String, + `sentimentScore` Int8, + `gitInsertions` UInt32, + `gitDeletions` UInt32, + `score` Int8, + `isContribution` UInt8, + `pullRequestReviewState` LowCardinality(String), + `gitChangedLines` UInt64, + `gitChangedLinesBucket` String, + `organizationCountryCode` LowCardinality(String), + `organizationName` String, + `snapshotId` DateTime + +ENGINE MergeTree +ENGINE_SORTING_KEY segmentId, timestamp, type, platform, memberId, organizationId diff --git a/services/libs/tinybird/datasources/activityRelations_deduplicated_cleaned_bucket_6_ds.datasource b/services/libs/tinybird/datasources/activityRelations_deduplicated_cleaned_bucket_6_ds.datasource new file mode 100644 index 0000000000..cc8a56e482 --- /dev/null +++ b/services/libs/tinybird/datasources/activityRelations_deduplicated_cleaned_bucket_6_ds.datasource @@ -0,0 +1,32 @@ +SCHEMA > + `activityId` String, + `conversationId` String, + `createdAt` DateTime64(3), + `updatedAt` DateTime64(3), + `memberId` String, + `objectMemberId` String, + `objectMemberUsername` String, + `organizationId` String, + `parentId` String, + `platform` LowCardinality(String), + `segmentId` String, + `username` String, + `sourceId` String, + `type` LowCardinality(String), + `timestamp` DateTime64(3), + `sourceParentId` String, + `channel` String, + `sentimentScore` Int8, + `gitInsertions` UInt32, + `gitDeletions` UInt32, + `score` Int8, + `isContribution` UInt8, + `pullRequestReviewState` LowCardinality(String), + `gitChangedLines` UInt64, + `gitChangedLinesBucket` String, + `organizationCountryCode` LowCardinality(String), + `organizationName` String, + `snapshotId` DateTime + +ENGINE MergeTree +ENGINE_SORTING_KEY segmentId, timestamp, type, platform, memberId, organizationId diff --git a/services/libs/tinybird/datasources/activityRelations_deduplicated_cleaned_bucket_7_ds.datasource b/services/libs/tinybird/datasources/activityRelations_deduplicated_cleaned_bucket_7_ds.datasource new file mode 100644 index 0000000000..cc8a56e482 --- /dev/null +++ b/services/libs/tinybird/datasources/activityRelations_deduplicated_cleaned_bucket_7_ds.datasource @@ -0,0 +1,32 @@ +SCHEMA > + `activityId` String, + `conversationId` String, + `createdAt` DateTime64(3), + `updatedAt` DateTime64(3), + `memberId` String, + `objectMemberId` String, + `objectMemberUsername` String, + `organizationId` String, + `parentId` String, + `platform` LowCardinality(String), + `segmentId` String, + `username` String, + `sourceId` String, + `type` LowCardinality(String), + `timestamp` DateTime64(3), + `sourceParentId` String, + `channel` String, + `sentimentScore` Int8, + `gitInsertions` UInt32, + `gitDeletions` UInt32, + `score` Int8, + `isContribution` UInt8, + `pullRequestReviewState` LowCardinality(String), + `gitChangedLines` UInt64, + `gitChangedLinesBucket` String, + `organizationCountryCode` LowCardinality(String), + `organizationName` String, + `snapshotId` DateTime + +ENGINE MergeTree +ENGINE_SORTING_KEY segmentId, timestamp, type, platform, memberId, organizationId diff --git a/services/libs/tinybird/datasources/activityRelations_deduplicated_cleaned_bucket_8_ds.datasource b/services/libs/tinybird/datasources/activityRelations_deduplicated_cleaned_bucket_8_ds.datasource new file mode 100644 index 0000000000..cc8a56e482 --- /dev/null +++ b/services/libs/tinybird/datasources/activityRelations_deduplicated_cleaned_bucket_8_ds.datasource @@ -0,0 +1,32 @@ +SCHEMA > + `activityId` String, + `conversationId` String, + `createdAt` DateTime64(3), + `updatedAt` DateTime64(3), + `memberId` String, + `objectMemberId` String, + `objectMemberUsername` String, + `organizationId` String, + `parentId` String, + `platform` LowCardinality(String), + `segmentId` String, + `username` String, + `sourceId` String, + `type` LowCardinality(String), + `timestamp` DateTime64(3), + `sourceParentId` String, + `channel` String, + `sentimentScore` Int8, + `gitInsertions` UInt32, + `gitDeletions` UInt32, + `score` Int8, + `isContribution` UInt8, + `pullRequestReviewState` LowCardinality(String), + `gitChangedLines` UInt64, + `gitChangedLinesBucket` String, + `organizationCountryCode` LowCardinality(String), + `organizationName` String, + `snapshotId` DateTime + +ENGINE MergeTree +ENGINE_SORTING_KEY segmentId, timestamp, type, platform, memberId, organizationId diff --git a/services/libs/tinybird/datasources/activityRelations_deduplicated_cleaned_bucket_9_ds.datasource b/services/libs/tinybird/datasources/activityRelations_deduplicated_cleaned_bucket_9_ds.datasource new file mode 100644 index 0000000000..cc8a56e482 --- /dev/null +++ b/services/libs/tinybird/datasources/activityRelations_deduplicated_cleaned_bucket_9_ds.datasource @@ -0,0 +1,32 @@ +SCHEMA > + `activityId` String, + `conversationId` String, + `createdAt` DateTime64(3), + `updatedAt` DateTime64(3), + `memberId` String, + `objectMemberId` String, + `objectMemberUsername` String, + `organizationId` String, + `parentId` String, + `platform` LowCardinality(String), + `segmentId` String, + `username` String, + `sourceId` String, + `type` LowCardinality(String), + `timestamp` DateTime64(3), + `sourceParentId` String, + `channel` String, + `sentimentScore` Int8, + `gitInsertions` UInt32, + `gitDeletions` UInt32, + `score` Int8, + `isContribution` UInt8, + `pullRequestReviewState` LowCardinality(String), + `gitChangedLines` UInt64, + `gitChangedLinesBucket` String, + `organizationCountryCode` LowCardinality(String), + `organizationName` String, + `snapshotId` DateTime + +ENGINE MergeTree +ENGINE_SORTING_KEY segmentId, timestamp, type, platform, memberId, organizationId diff --git a/services/libs/tinybird/datasources/activityRelations_enrich_clean_snapshot_MV_ds.datasource b/services/libs/tinybird/datasources/activityRelations_enrich_snapshot_MV_ds.datasource similarity index 100% rename from services/libs/tinybird/datasources/activityRelations_enrich_clean_snapshot_MV_ds.datasource rename to services/libs/tinybird/datasources/activityRelations_enrich_snapshot_MV_ds.datasource diff --git a/services/libs/tinybird/datasources/activityRelations_enrich_snapshot_MV_ds_2.datasource b/services/libs/tinybird/datasources/activityRelations_enrich_snapshot_MV_ds_2.datasource new file mode 100644 index 0000000000..c280f3d1b4 --- /dev/null +++ b/services/libs/tinybird/datasources/activityRelations_enrich_snapshot_MV_ds_2.datasource @@ -0,0 +1,35 @@ +SCHEMA > + `activityId` String, + `conversationId` String, + `createdAt` DateTime64(3), + `updatedAt` DateTime64(3), + `memberId` String, + `objectMemberId` String, + `objectMemberUsername` String, + `organizationId` String, + `parentId` String, + `platform` LowCardinality(String), + `segmentId` String, + `username` String, + `sourceId` String, + `type` LowCardinality(String), + `timestamp` DateTime64(3), + `sourceParentId` String, + `channel` String, + `sentimentScore` Int8, + `gitInsertions` UInt32, + `gitDeletions` UInt32, + `score` Int8, + `isContribution` UInt8, + `pullRequestReviewState` LowCardinality(String), + `gitChangedLines` UInt64, + `gitChangedLinesBucket` String, + `organizationCountryCode` LowCardinality(String), + `organizationName` String, + `snapshotId` DateTime + +ENGINE ReplacingMergeTree +ENGINE_PARTITION_KEY snapshotId +ENGINE_SORTING_KEY segmentId, timestamp, type, platform, channel, sourceId +ENGINE_TTL toDateTime(snapshotId) + toIntervalHour(3) +ENGINE_VER updatedAt diff --git a/services/libs/tinybird/datasources/activityRelations_enriched_deduplicated_ds.datasource b/services/libs/tinybird/datasources/activityRelations_enriched_deduplicated_ds.datasource new file mode 100644 index 0000000000..24547e43ef --- /dev/null +++ b/services/libs/tinybird/datasources/activityRelations_enriched_deduplicated_ds.datasource @@ -0,0 +1,36 @@ +TOKEN "delete_fork_repos_activities_script" READ + +SCHEMA > + `activityId` String, + `conversationId` String, + `createdAt` DateTime64(3), + `updatedAt` DateTime64(3), + `memberId` String, + `objectMemberId` String, + `objectMemberUsername` String, + `organizationId` String, + `parentId` String, + `platform` LowCardinality(String), + `segmentId` String, + `username` String, + `sourceId` String, + `type` LowCardinality(String), + `timestamp` DateTime64(3), + `sourceParentId` String, + `channel` String, + `sentimentScore` Int8, + `gitInsertions` UInt32, + `gitDeletions` UInt32, + `score` Int8, + `isContribution` UInt8, + `pullRequestReviewState` LowCardinality(String), + `gitChangedLines` UInt64, + `gitChangedLinesBucket` String, + `organizationCountryCode` LowCardinality(String), + `organizationName` String, + `snapshotId` DateTime + +ENGINE MergeTree +ENGINE_PARTITION_KEY snapshotId +ENGINE_SORTING_KEY segmentId, timestamp, type, platform, memberId, organizationId +ENGINE_TTL toDateTime(snapshotId) + toIntervalDay(2) diff --git a/services/libs/tinybird/datasources/activityTypes.datasource b/services/libs/tinybird/datasources/activityTypes.datasource index 1776f6f55b..9a5942ca2e 100644 --- a/services/libs/tinybird/datasources/activityTypes.datasource +++ b/services/libs/tinybird/datasources/activityTypes.datasource @@ -23,5 +23,6 @@ SCHEMA > `updatedAt` DateTime64(3) `json:$.record.updatedAt` ENGINE ReplacingMergeTree -ENGINE_SORTING_KEY (platform, activityType) +ENGINE_PARTITION_KEY toYear(createdAt) +ENGINE_SORTING_KEY platform, activityType ENGINE_VER updatedAt diff --git a/services/libs/tinybird/datasources/categories.datasource b/services/libs/tinybird/datasources/categories.datasource index aefd591445..12e63e66ac 100644 --- a/services/libs/tinybird/datasources/categories.datasource +++ b/services/libs/tinybird/datasources/categories.datasource @@ -8,7 +8,7 @@ DESCRIPTION > - `categoryGroupId` links to the parent category group this category belongs to (empty string if no group association). - `createdAt` and `updatedAt` are standard timestamp fields for record lifecycle tracking. -TAGS "Project categories", "Taxonomy" +TAGS "Project categories" SCHEMA > `id` String `json:$.record.id`, @@ -20,5 +20,6 @@ SCHEMA > `deletedAt` Nullable(DateTime64(3)) `json:$.record.deletedAt` ENGINE ReplacingMergeTree +ENGINE_PARTITION_KEY toYear(createdAt) ENGINE_SORTING_KEY slug ENGINE_VER updatedAt diff --git a/services/libs/tinybird/datasources/categoryGroups.datasource b/services/libs/tinybird/datasources/categoryGroups.datasource index 1495a525bf..efe1500ca2 100644 --- a/services/libs/tinybird/datasources/categoryGroups.datasource +++ b/services/libs/tinybird/datasources/categoryGroups.datasource @@ -8,7 +8,7 @@ DESCRIPTION > - `type` specifies the category group type or classification (empty string if not specified). - `createdAt` and `updatedAt` are standard timestamp fields for record lifecycle tracking. -TAGS "Category management", "Taxonomy" +TAGS "Taxonomy" SCHEMA > `id` String `json:$.record.id`, @@ -20,5 +20,6 @@ SCHEMA > `deletedAt` Nullable(DateTime64(3)) `json:$.record.deletedAt` ENGINE ReplacingMergeTree +ENGINE_PARTITION_KEY toYear(createdAt) ENGINE_SORTING_KEY slug ENGINE_VER updatedAt diff --git a/services/libs/tinybird/datasources/collections.datasource b/services/libs/tinybird/datasources/collections.datasource index a05b7abf7d..f4b7caa2d0 100644 --- a/services/libs/tinybird/datasources/collections.datasource +++ b/services/libs/tinybird/datasources/collections.datasource @@ -10,7 +10,7 @@ DESCRIPTION > - `starred` indicates whether this collection is featured or highlighted (Bool, 0 default for not starred). - `createdAt` and `updatedAt` are standard timestamp fields for record lifecycle tracking. -TAGS "Project collections", "Organization" +TAGS "Project collections" SCHEMA > `id` String `json:$.record.id`, diff --git a/services/libs/tinybird/datasources/collectionsInsightsProjects.datasource b/services/libs/tinybird/datasources/collectionsInsightsProjects.datasource index d83b171156..144db3550e 100644 --- a/services/libs/tinybird/datasources/collectionsInsightsProjects.datasource +++ b/services/libs/tinybird/datasources/collectionsInsightsProjects.datasource @@ -8,8 +8,6 @@ DESCRIPTION > - `starred` indicates whether this project is featured/highlighted within the collection (UInt8 boolean, 0 default). - `createdAt` and `updatedAt` are standard timestamp fields for record lifecycle tracking. -TAGS "Collection management", "Project relationships" - SCHEMA > `id` String `json:$.record.id`, `collectionId` String `json:$.record.collectionId`, diff --git a/services/libs/tinybird/datasources/criticalityScores.datasource b/services/libs/tinybird/datasources/criticalityScores.datasource index cf2e30bc3b..de954d82f6 100644 --- a/services/libs/tinybird/datasources/criticalityScores.datasource +++ b/services/libs/tinybird/datasources/criticalityScores.datasource @@ -9,8 +9,6 @@ DESCRIPTION > - `rank` is the relative ranking of this repository among all scored repositories (0 default). - `createdAt` and `updatedAt` are standard timestamp fields for record lifecycle tracking. -TAGS "Repository scoring", "Project prioritization" - SCHEMA > `id` String `json:$.record.id`, `name` String `json:$.record.name` DEFAULT '', diff --git a/services/libs/tinybird/datasources/health_score_copy_ds.datasource b/services/libs/tinybird/datasources/health_score_copy_ds.datasource index 70919b1239..bd7539f7a9 100644 --- a/services/libs/tinybird/datasources/health_score_copy_ds.datasource +++ b/services/libs/tinybird/datasources/health_score_copy_ds.datasource @@ -21,8 +21,6 @@ DESCRIPTION > - `securityPercentage`, `contributorPercentage`, `popularityPercentage`, `developmentPercentage` are individual health dimension scores. - `overallScore` is the computed overall health score combining all dimensions. -TAGS "Health metrics", "Project scoring" - SCHEMA > `id` String, `segmentId` String, diff --git a/services/libs/tinybird/datasources/insightsProjects.datasource b/services/libs/tinybird/datasources/insightsProjects.datasource index 3f0c849a29..5e19507b35 100644 --- a/services/libs/tinybird/datasources/insightsProjects.datasource +++ b/services/libs/tinybird/datasources/insightsProjects.datasource @@ -14,7 +14,7 @@ DESCRIPTION > - Social media fields (`github`, `linkedin`, `twitter`, `website`) contain project's external social links. - `keywords` array contains searchable keywords and tags for the project. -TAGS "Project metadata", "Insights" +TAGS "Project metadata" SCHEMA > `id` String `json:$.record.id`, diff --git a/services/libs/tinybird/datasources/integrations.datasource b/services/libs/tinybird/datasources/integrations.datasource index a116b244a6..92f9b300e0 100644 --- a/services/libs/tinybird/datasources/integrations.datasource +++ b/services/libs/tinybird/datasources/integrations.datasource @@ -24,5 +24,6 @@ SCHEMA > `deletedAt` Nullable(DateTime64(3)) `json:$.record.deletedAt` ENGINE ReplacingMergeTree +ENGINE_PARTITION_KEY toYear(createdAt) ENGINE_SORTING_KEY segmentId, id ENGINE_VER updatedAt diff --git a/services/libs/tinybird/datasources/maintainersInternal.datasource b/services/libs/tinybird/datasources/maintainersInternal.datasource index 7c76f1b757..457e7edee6 100644 --- a/services/libs/tinybird/datasources/maintainersInternal.datasource +++ b/services/libs/tinybird/datasources/maintainersInternal.datasource @@ -12,8 +12,6 @@ DESCRIPTION > - `endDate` is when the maintainer role expires (empty string if permanent or active). - `createdAt` and `updatedAt` are standard timestamp fields for record lifecycle tracking. -TAGS "Maintainer roles", "Repository governance" - SCHEMA > `id` String `json:$.record.id`, `role` String `json:$.record.role` DEFAULT '', @@ -27,5 +25,6 @@ SCHEMA > `updatedAt` DateTime64(3) `json:$.record.updatedAt` ENGINE ReplacingMergeTree +ENGINE_PARTITION_KEY toYear(startDate) ENGINE_SORTING_KEY identityId, id ENGINE_VER updatedAt diff --git a/services/libs/tinybird/datasources/memberIdentities.datasource b/services/libs/tinybird/datasources/memberIdentities.datasource index 188e9ff1c6..653fef2d5f 100644 --- a/services/libs/tinybird/datasources/memberIdentities.datasource +++ b/services/libs/tinybird/datasources/memberIdentities.datasource @@ -12,8 +12,6 @@ DESCRIPTION > - `verified` indicates whether this identity has been verified as belonging to the member (UInt8 boolean). - `createdAt` and `updatedAt` are standard timestamp fields for record lifecycle tracking. -TAGS "Member identities", "Cross-platform linking" - SCHEMA > `id` String `json:$.record.id`, `memberId` String `json:$.record.memberId`, diff --git a/services/libs/tinybird/datasources/members_public_names_ds.datasource b/services/libs/tinybird/datasources/members_public_names_ds.datasource index d10a8e6b8f..4df8c8591c 100644 --- a/services/libs/tinybird/datasources/members_public_names_ds.datasource +++ b/services/libs/tinybird/datasources/members_public_names_ds.datasource @@ -5,8 +5,6 @@ DESCRIPTION > - `memberId` is the unique identifier linking to the member record. - `publicName` is the public display name for the member. -TAGS "Member names", "UI optimization" - SCHEMA > `memberId` String, `publicName` String diff --git a/services/libs/tinybird/datasources/members_sorted.datasource b/services/libs/tinybird/datasources/members_sorted.datasource index de9d9227c9..78a9dcfec0 100644 --- a/services/libs/tinybird/datasources/members_sorted.datasource +++ b/services/libs/tinybird/datasources/members_sorted.datasource @@ -14,8 +14,6 @@ DESCRIPTION > - `score` is the computed activity/engagement score for the member. - `publicName` is the denormalized public display name for efficient queries. -TAGS "Member profiles", "Query optimization" - SCHEMA > `id` String, `attributes` String, diff --git a/services/libs/tinybird/datasources/mentions.datasource b/services/libs/tinybird/datasources/mentions.datasource index 632028a8b9..e60a92382b 100644 --- a/services/libs/tinybird/datasources/mentions.datasource +++ b/services/libs/tinybird/datasources/mentions.datasource @@ -23,8 +23,6 @@ DESCRIPTION > - `projectSlug` identifies which project this mention belongs to. - `createdAt` is the timestamp when the record was created in Tinybird. -TAGS "" Octolens integration", Community", "Sentiment analysis" - SCHEMA > `sourceId` String `json:$.sourceId` DEFAULT '', `url` String `json:$.url` DEFAULT '', diff --git a/services/libs/tinybird/datasources/organizations.datasource b/services/libs/tinybird/datasources/organizations.datasource index 980f910d57..4762a05f4e 100644 --- a/services/libs/tinybird/datasources/organizations.datasource +++ b/services/libs/tinybird/datasources/organizations.datasource @@ -16,8 +16,6 @@ DESCRIPTION > - `industry` specifies the primary industry or sector the organization operates in (empty string if not specified). - `founded` is the year the organization was founded (0 default if unknown). -TAGS "Organization profiles", "Company analytics" - SCHEMA > `id` String `json:$.record.id`, `displayName` String `json:$.record.displayName`, diff --git a/services/libs/tinybird/datasources/packageDownloads.datasource b/services/libs/tinybird/datasources/packageDownloads.datasource index b305ad11c3..6a2882600f 100644 --- a/services/libs/tinybird/datasources/packageDownloads.datasource +++ b/services/libs/tinybird/datasources/packageDownloads.datasource @@ -17,8 +17,6 @@ DESCRIPTION > - `downloadsCount` is the cumulative total number of package downloads until the date (0 default). - `createdAt` and `updatedAt` are standard timestamp fields for record lifecycle tracking. -TAGS "Package downloads widget" - SCHEMA > `id` UInt64 `json:$.record.id`, `date` Date `json:$.record.date`, @@ -36,5 +34,6 @@ SCHEMA > `updatedAt` DateTime64(3) `json:$.record.updated_at` ENGINE ReplacingMergeTree +ENGINE_PARTITION_KEY toYear(date) ENGINE_SORTING_KEY insightsProjectId, date, ecosystem, repo, name ENGINE_VER updatedAt diff --git a/services/libs/tinybird/datasources/pull_requests_analyzed.datasource b/services/libs/tinybird/datasources/pull_requests_analyzed.datasource index a08f3df72d..e4f06058c5 100644 --- a/services/libs/tinybird/datasources/pull_requests_analyzed.datasource +++ b/services/libs/tinybird/datasources/pull_requests_analyzed.datasource @@ -13,8 +13,6 @@ DESCRIPTION > - `platform` is the source platform (GitHub, GitLab, Gerrit, etc.) inherited from activities. - `numberOfPatchsets` is the count of patchsets for Gerrit changesets (nullable, only applicable to Gerrit). -TAGS "Pull request analytics", "Developer workflow metrics" - SCHEMA > `id` String, `sourceId` String, diff --git a/services/libs/tinybird/datasources/repositoryGroups.datasource b/services/libs/tinybird/datasources/repositoryGroups.datasource index e4009667c7..42a8216251 100644 --- a/services/libs/tinybird/datasources/repositoryGroups.datasource +++ b/services/libs/tinybird/datasources/repositoryGroups.datasource @@ -9,5 +9,6 @@ SCHEMA > `deletedAt` Nullable(DateTime64(3)) `json:$.record.deletedAt` ENGINE ReplacingMergeTree +ENGINE_PARTITION_KEY toYear(createdAt) ENGINE_SORTING_KEY insightsProjectId, id ENGINE_VER updatedAt diff --git a/services/libs/tinybird/datasources/searchVolume.datasource b/services/libs/tinybird/datasources/searchVolume.datasource index 7cd00acba8..64322da484 100644 --- a/services/libs/tinybird/datasources/searchVolume.datasource +++ b/services/libs/tinybird/datasources/searchVolume.datasource @@ -9,8 +9,6 @@ DESCRIPTION > - `volume` is the search volume count for the given time period (UInt64). - `updatedAt` is the timestamp when this record was last updated. -TAGS "Search metrics", "Project visibility" - SCHEMA > `id` UInt64 `json:$.record.id`, `insightsProjectId` UUID `json:$.record.insights_project_id`, diff --git a/services/libs/tinybird/datasources/securityInsightsEvaluations.datasource b/services/libs/tinybird/datasources/securityInsightsEvaluations.datasource index 2b33d84e0e..db7145249e 100644 --- a/services/libs/tinybird/datasources/securityInsightsEvaluations.datasource +++ b/services/libs/tinybird/datasources/securityInsightsEvaluations.datasource @@ -15,8 +15,6 @@ DESCRIPTION > - `remediationGuide` provides guidance for addressing security issues (empty string if not available). - `createdAt` and `updatedAt` are standard timestamp fields for record lifecycle tracking. -TAGS "Security insights widget" - SCHEMA > `id` String `json:$.record.id`, `securityInsightsEvaluationSuiteId` String `json:$.record.securityInsightsEvaluationSuiteId` DEFAULT '', @@ -33,5 +31,6 @@ SCHEMA > `updatedAt` DateTime64(3) `json:$.record.updatedAt` ENGINE ReplacingMergeTree +ENGINE_PARTITION_KEY toYear(createdAt) ENGINE_SORTING_KEY insightsProjectSlug, repo, controlId ENGINE_VER updatedAt diff --git a/services/libs/tinybird/datasources/security_deduplicated_merged_ds.datasource b/services/libs/tinybird/datasources/security_deduplicated_merged_ds.datasource index fb92054664..b994b33a72 100644 --- a/services/libs/tinybird/datasources/security_deduplicated_merged_ds.datasource +++ b/services/libs/tinybird/datasources/security_deduplicated_merged_ds.datasource @@ -11,8 +11,6 @@ DESCRIPTION > - `result` contains the evaluation result (pass, fail, warning, etc.). - `assessments` is an array of maps containing detailed assessment information. -TAGS "Security analytics", "Consolidated data" - SCHEMA > `evaluationId` String, `insightsProjectSlug` String, diff --git a/services/libs/tinybird/datasources/segments.datasource b/services/libs/tinybird/datasources/segments.datasource index 2cc92b2e12..504ada18e1 100644 --- a/services/libs/tinybird/datasources/segments.datasource +++ b/services/libs/tinybird/datasources/segments.datasource @@ -20,7 +20,7 @@ DESCRIPTION > - `sourceParentId` is the external parent identifier from the source system (empty string if not applicable). - `createdAt` and `updatedAt` are standard timestamp fields for record lifecycle tracking. -TAGS "Project segments", "Hierarchy management" +TAGS "Project segments" SCHEMA > `id` String `json:$.record.id`, diff --git a/services/libs/tinybird/datasources/segments_aggregates_with_ids_datasource.datasource b/services/libs/tinybird/datasources/segments_aggregates_with_ids_datasource.datasource index e94c15f5d1..c96c42a4fc 100644 --- a/services/libs/tinybird/datasources/segments_aggregates_with_ids_datasource.datasource +++ b/services/libs/tinybird/datasources/segments_aggregates_with_ids_datasource.datasource @@ -12,8 +12,6 @@ DESCRIPTION > - `categoryId` links to the category this segment is classified under. - `categoryGroupId` links to the top-level category group for this segment. -TAGS "Segment analytics", "Hierarchical aggregates" - SCHEMA > `segmentId` String, `contributorCount` UInt64, diff --git a/services/libs/tinybird/datasources/softwareValueProjectCosts.datasource b/services/libs/tinybird/datasources/softwareValueProjectCosts.datasource index 95e1020f52..496cbc4de6 100644 --- a/services/libs/tinybird/datasources/softwareValueProjectCosts.datasource +++ b/services/libs/tinybird/datasources/softwareValueProjectCosts.datasource @@ -6,8 +6,6 @@ DESCRIPTION > - `estimatedCost` is the computed development cost estimate in monetary units (UInt64). - `updatedAt` is the timestamp when the cost estimate was last computed. -TAGS "Project valuation", "Cost estimation" - SCHEMA > `repoUrl` String `json:$.record.repo_url`, `estimatedCost` UInt64 `json:$.record.estimated_cost`, diff --git a/services/libs/tinybird/LAMBDA_ARCHITECTURE.md b/services/libs/tinybird/lambda-architecture.md similarity index 82% rename from services/libs/tinybird/LAMBDA_ARCHITECTURE.md rename to services/libs/tinybird/lambda-architecture.md index 54b009ecd9..2ed5535a3b 100644 --- a/services/libs/tinybird/LAMBDA_ARCHITECTURE.md +++ b/services/libs/tinybird/lambda-architecture.md @@ -1,5 +1,24 @@ # Lambda Architecture for Tinybird Data Pipelines +## Table of Contents + +- [Overview](#overview) +- [Main Activity Relations Pipeline](#main-activity-relations-pipeline-lambda-architecture---produces-unfiltered-data) +- [Downstream Consumers](#downstream-consumers-of-activityrelations_enriched_deduplicated_ds) +- [Timeline View](#timeline-view-hourly-execution) +- [Snapshot-Based Deduplication](#snapshot-based-deduplication-strategy) +- [Pull Requests Pipeline](#pull-requests-pipeline-primary-lambda-architecture-use-case) +- [Query Patterns](#query-patterns) +- [Initial Snapshot Pipes](#initial-snapshot-pipes-bootstrap) +- [Troubleshooting](#troubleshooting) + +**Related Documentation:** +- [Main README](./README.md) - Overview and getting started +- [Bucketing Architecture](./bucketing-architecture.md) - Parallel architecture for filtered data +- [Data Flow Diagram](./dataflow) - Visual system overview + +--- + ## Overview This document explains the **Lambda Architecture** implementation used in our Tinybird data pipelines. Lambda Architecture is a data processing design pattern that combines: @@ -8,9 +27,11 @@ This document explains the **Lambda Architecture** implementation used in our Ti - **Merge Layer (Scheduled)**: Copy pipes that merge real-time snapshots with historical data on a schedule - **Serving Layer**: Snapshot-based datasources that provide deduplicated views via query-time filtering +> **Note**: This is one of two parallel architectures processing activityRelations data. See the [Main README](./README.md#two-parallel-architectures) for a comparison with the Bucketing Architecture. + --- -## Main Activity Relations Pipeline +## Main Activity Relations Pipeline (Lambda Architecture - Produces Unfiltered Data) ``` ┌─────────────────────────────────────────────────────────────────────────────┐ @@ -61,7 +82,7 @@ This document explains the **Lambda Architecture** implementation used in our Ti [2] Enrichment Layer - Real-time Materialized View ┌────────────────────────────────────────┐ - │ activityRelations_enrich_clean │ + │ activityRelations_enrich │ │ _snapshot_MV │ │ (TYPE: MATERIALIZED) │ └────────────────────────────────────────┘ @@ -69,14 +90,14 @@ This document explains the **Lambda Architecture** implementation used in our Ti What it does: ├─ Enriches: country codes, org names, gitChangedLines, buckets - ├─ Filters: activities from valid members, repos, segments + ├─ Does NOT filter: includes ALL activities (bots, disabled repos, etc.) ├─ Attaches snapshot IDs to rows: toStartOfInterval(updatedAt, 1 hour) + 1 hour └─ Runs: Immediately on new data [3] Enrichment Layer Output ┌────────────────────────────────────────┐ - │ activityRelations_enrich_clean │ + │ activityRelations_enrich │ │ _snapshot_MV_ds │ └────────────────────────────────────────┘ ↓ @@ -98,7 +119,7 @@ This document explains the **Lambda Architecture** implementation used in our Ti ┌────────────────────────────────────────┐ │ activityRelations_snapshot_ │ │ merger_copy │ - │ (TYPE: COPY, every hour at :10) │ + │ (TYPE: COPY, every day at 1 AM) │ └────────────────────────────────────────┘ ↓ @@ -107,14 +128,14 @@ This document explains the **Lambda Architecture** implementation used in our Ti ├─ Fetches: OLD data from serving layer (current max snapshotId) ├─ Merges: UNION ALL → creates new snapshot ├─ Mode: append - └─ Schedule: 10 * * * * (hourly at minute 10) + └─ Schedule: 0 1 * * * (every day at 1 AM UTC) -[5] Serving Layer - Final Datasource +[5] Final Datasource ┌────────────────────────────────────────┐ - │ activityRelations_deduplicated │ - │ _cleaned_ds │ - │ (queried by all analytics) │ + │ activityRelations_enriched │ + │ _deduplicated_ds │ + │ (UNFILTERED - used by CDP, monitoring)│ └────────────────────────────────────────┘ ↓ • TYPE: MergeTree @@ -136,46 +157,83 @@ This document explains the **Lambda Architecture** implementation used in our Ti --- +## Downstream Consumers of activityRelations_enriched_deduplicated_ds + +The unfiltered lambda architecture output is consumed by: + +### 1. Pull Requests Pipeline (Primary consumer - detailed section below) +- Analyzes PR lifecycle: opened → reviewed → approved → merged +- Requires complete activity history for accurate PR state tracking +- Output: `pull_requests_analyzed` datasource + +### 2. CDP Pipes +- **`activities_relations_filtered.pipe`**: Provides filtered views for CDP operations +- **`activities_filtered.pipe`**: Main activity filtering endpoint +- **`activities_filtered_historical_cutoff.pipe`**: Historical data with cutoff dates +- **`activities_filtered_retention.pipe`**: Retention-based activity filtering +- **`activities_daily_counts.pipe`**: Aggregates daily activity metrics for reporting +- Requires complete dataset for accurate historical counts and trend analysis + +### 3. Monitoring Pipes +- **`monitoring_entities.pipe`**: Tracks entity health and data quality metrics +- **`monitoring_copy_pipe_executions.pipe`**: Monitors copy pipe execution status +- **`monitoring_copy_pipes_spread_info.pipe`**: Tracks copy pipe distribution and load +- **`monitoring_long_running_endpoints.pipe`**: Detects slow query performance +- Needs unfiltered data to monitor complete ingestion pipeline +- Detects anomalies in data flow and processing + +**Why unfiltered?** These consumers require the complete, unfiltered dataset (including bot activities and disabled repositories) for PR analysis, CDP operations, and system monitoring. The bucketing architecture filters data for Insights queries, but these operational pipelines need the raw, complete view. + +--- + ## Timeline View (Hourly Execution) +**Example: Pull Requests Pipeline** + ``` ═══════════════════════════════════════════════════════════════════════════════ HOURLY EXECUTION TIMELINE ═══════════════════════════════════════════════════════════════════════════════ -Time: :00 :10 :59 Next Hour :00 +Time: :00 :30 :59 Next Hour :00 │ │ │ │ │ │ │ │ -Step 1: │ New data │ │ │ - │ arrives in │ │ │ - │ activityRel │ │ │ +Step 1: │ PR events │ │ │ + │ arrive (opened,│ │ │ + │ reviewed, │ │ │ + │ approved, │ │ │ + │ merged) │ │ │ ↓ │ │ │ │ │ │ │ Step 2: MV triggers │ │ │ immediately │ │ │ - • Enriches data │ │ │ - • Filters │ │ │ + • Enriches PR │ │ │ + • Calculates │ │ │ + • metrics │ │ │ • Adds snapshot │ │ │ ↓ │ │ │ │ │ │ │ Step 3: Writes to │ │ │ - MV_ds with │ │ │ - snapshotId = │ │ │ - (next hour) │ │ │ + pull_request_ │ │ │ + analyzed_MV_ds │ │ │ + with snapshotId │ │ │ + = (next hour) │ │ │ ↓ │ │ │ │ │ Step 4: Copy pipe runs │ │ - • Fetch NEW │ │ + (at :30) │ │ + • Fetch NEW PRs │ │ • (from MV_ds) │ │ - • Fetch OLD │ │ + • Fetch OLD PRs │ │ • (from serving) │ │ • UNION ALL │ │ • New snapshotId │ │ ↓ │ │ │ │ │ -Step 5: Appends to serving │ │ - layer with new │ │ - snapshot │ │ +Step 5: Appends to │ │ + pull_requests_ │ │ + analyzed │ │ + datasource │ │ ↓ ↓ [Continues [Next cycle] processing] @@ -211,8 +269,8 @@ Instead of using FINAL in copy pipes or query time, our approach uses **snapshot -- Query with snapshot filter: SELECT * -FROM activityRelations_deduplicated_cleaned_ds -WHERE snapshotId = (SELECT max(snapshotId) FROM activityRelations_deduplicated_cleaned_ds) +FROM activityRelations_enriched_deduplicated_ds +WHERE snapshotId = (SELECT max(snapshotId) FROM activityRelations_enriched_deduplicated_ds) -- Result (deduplicated logical view): ┌─────────┬─────────────┬──────────────────┐ @@ -231,12 +289,12 @@ WHERE snapshotId = (SELECT max(snapshotId) FROM activityRelations_deduplicated_c **Fast copy operations**: Append mode copys are much lightweight and fast then replace mode copys -**Reliable**: TTL automatically manages storage +**Reliable**: TTL automatically manages storagge --- -## Pull Requests Specialized Pipeline +## Pull Requests Pipeline The Pull Requests pipeline demonstrates how to **branch from the main pipeline** for specialized, real-time analytics. ``` @@ -246,7 +304,7 @@ The Pull Requests pipeline demonstrates how to **branch from the main pipeline** └─────────────────────────────────────────────────────────────────────────────┘ [From Step 3 of Main Pipeline] - activityRelations_enrich_clean_snapshot_MV_ds + activityRelations_enrich_snapshot_MV_ds ↓ (filters for PR-related activity types only) │ ├─ pull_request-opened, merge_request-opened, changeset-created @@ -435,15 +493,15 @@ Initial snapshot pipes: ### Examples -#### 1. activityRelations_enrich_clean_initial_snapshot +#### 1. activityRelations_enrich_initial_snapshot ``` -File: activityRelations_enrich_clean_initial_snapshot.pipe +File: activityRelations_enrich_initial_snapshot.pipe TYPE: COPY COPY_MODE: replace COPY_SCHEDULE: @on-demand -TARGET_DATASOURCE: activityRelations_deduplicated_cleaned_ds +TARGET_DATASOURCE: activityRelations_enriched_deduplicated_ds What it does: ├─ Reads raw activityRelations (base table) @@ -506,7 +564,7 @@ Usage: Run once to bootstrap segment-level metrics **How to run:** ```bash # Via Tinybird CLI (assuming you have tb CLI configured) -tb pipe copy run activityRelations_enrich_clean_initial_snapshot --wait +tb pipe copy run activityRelations_enrich_initial_snapshot --wait tb pipe copy run pull_request_analysis_initial_snapshot --wait tb pipe copy run segmentId_aggregates_initial_snapshot --wait ``` @@ -530,7 +588,7 @@ tb pipe copy run segmentId_aggregates_initial_snapshot --wait **Symptom**: Queries return stale data **Check**: -1. Is the MV running? `SELECT * FROM activityRelations_enrich_clean_snapshot_MV_ds ORDER BY snapshotId DESC LIMIT 10` +1. Is the MV running? `SELECT * FROM activityRelations_enrich_snapshot_MV_ds ORDER BY snapshotId DESC LIMIT 10` 2. Is the copy pipe scheduled? Check `COPY_SCHEDULE` in pipe definition 3. Check Tinybird logs for errors diff --git a/services/libs/tinybird/pipes/active_contributors.pipe b/services/libs/tinybird/pipes/active_contributors.pipe index 77b5aefd48..d1b06583d2 100644 --- a/services/libs/tinybird/pipes/active_contributors.pipe +++ b/services/libs/tinybird/pipes/active_contributors.pipe @@ -18,8 +18,6 @@ DESCRIPTION > - Without granularity: `contributorCount` (total unique contributors) - With granularity: `startDate`, `endDate`, and `contributorCount` for each time period -TAGS "Widget", "Contributors", "Active users" - NODE timeseries_generation_for_active_contributors SQL > % @@ -92,7 +90,7 @@ SQL > 'merge_request-review-approved', 'merge_request-review-changes-requested', -- Gerrit review activities - 'changeset_comment-created' + 'patchset_approval-created' ) NODE active_contributors_merged diff --git a/services/libs/tinybird/pipes/active_days.pipe b/services/libs/tinybird/pipes/active_days.pipe index dd636a5e3f..1d2899447d 100644 --- a/services/libs/tinybird/pipes/active_days.pipe +++ b/services/libs/tinybird/pipes/active_days.pipe @@ -18,8 +18,6 @@ DESCRIPTION > - Without granularity: `activeDaysCount`, `avgContributionsPerDay` - With granularity: `startDate`, `endDate`, and `activityCount` for each time period -TAGS "Widget", "Development metrics", "Active days" - NODE timeseries_generation_for_active_days SQL > % diff --git a/services/libs/tinybird/pipes/active_organizations.pipe b/services/libs/tinybird/pipes/active_organizations.pipe index 6c36d832e6..3e4733adbf 100644 --- a/services/libs/tinybird/pipes/active_organizations.pipe +++ b/services/libs/tinybird/pipes/active_organizations.pipe @@ -18,8 +18,6 @@ DESCRIPTION > - Without granularity: `organizationCount` (total unique organizations) - With granularity: `startDate`, `endDate`, and `organizationCount` for each time period -TAGS "Widget", "Organizations", "Active users" - NODE timeseries_generation_for_active_organizations SQL > % @@ -60,3 +58,7 @@ SQL > FROM activities_filtered {% else %} select * from timeseries_generation_for_active_organizations {% end %} + +NODE active_organizations_2 +SQL > + SELECT * FROM active_organizations_merged diff --git a/services/libs/tinybird/pipes/activities_count.pipe b/services/libs/tinybird/pipes/activities_count.pipe index 69f0ef6ffe..5ecf77895a 100644 --- a/services/libs/tinybird/pipes/activities_count.pipe +++ b/services/libs/tinybird/pipes/activities_count.pipe @@ -2,6 +2,7 @@ DESCRIPTION > - `activities_count.pipe` serves activity count analytics for any activity type based on the filtering parameters provided. - **When `granularity` is NOT provided, returns a single total count** of all activities matching the specified filters across the entire time range. - **When `granularity` is provided, returns time-series data** showing activity counts aggregated by different time periods (daily, weekly, monthly, quarterly, yearly). + - `onlyContributions`: **When filtering for non-contribution types (defined below), it needs to be set to 0**. - Uses `generate_timeseries` pipe to create consistent time periods and left joins activity data to handle periods with zero activity. - Primary use cases: activity trend charts (with granularity) and total activity counts for any activity type analytics (without granularity). - Parameters: @@ -13,14 +14,14 @@ DESCRIPTION > - `platform`: Optional string filter for source platform (e.g., 'github', 'discord', 'slack') - `activity_type`: Optional string filter for single activity type (e.g., 'authored-commit') - `activity_types`: Optional array of activity types (e.g., ['authored-commit', 'co-authored-commit']) - - `onlyContributions`: Optional boolean, defaults to 1 (contributions only), set to 0 for all activities + - Supported contribution activity types: 'pull_request-assigned', 'pull_request-reviewed', 'committed-commit', 'signed-off-commit', 'issues-closed', 'pull_request-merged', 'pull_request-review-requested', 'issue-comment', 'pull_request-comment', 'pull_request-review-thread-comment', 'authored-commit', 'pull_request-opened', 'star', 'fork', 'reaction', 'discussion-comment', 'message', 'member_leave', 'issues-opened', 'member_join', 'discussion-started', 'question', 'pull_request-closed', 'reviewed-commit', 'comment', 'reported-commit', 'co-authored-commit', 'post', 'answer', 'joined_guild', 'tested-commit', 'unstar', 'registered_as_attendee', 'channel_joined', 'registered_as_speaker', 'merge_request-closed', 'merge_request-opened', 'merge_request-review-approved', 'merge_request-assigned', 'approved-commit', 'informed-commit', 'influenced-commit', 'resolved-commit', 'page-created', 'page-updated', 'attachment-created', 'comment-created', 'enrolled_into_e-learning', 'issue-closed', 'issue-updated', 'enrolled_into_certification', 'certificate_issued', 'blogpost-created', 'blogpost-updated', 'hashtag', 'mention', 'issue-comment-created', 'changeset-abandoned', 'patchset-created', 'changeset-merged', 'changeset_comment-created', 'issue-comment-updated', 'patchset_comment-created', 'patchset_approval-created', 'issue-assigned', 'changeset-created', 'issue-created', 'issue-attachment-added', 'enrolled_into_instructor-led', 'create_topic', 'message_in_topic', 'merge_request-comment', 'merge_request-review-requested', 'issue-state-unknown', 'registered' + - Non-contribution activity types (require onlyContributions=0): 'star', 'pull_request-assigned', 'fork', 'pull_request-closed', 'pull_request-review-requested', 'enrolled_into_e-learning', 'enrolled_into_certification', 'certificate_issued', 'unstar', 'channel_joined', 'member_join', 'merge_request-assigned', 'authored-commit', 'member_leave', 'reaction', 'registered_as_attendee', 'committed-commit', 'changeset_comment-created', 'patchset-created', 'patchset_approval-created', 'patchset_comment-created', 'registered_as_speaker', 'joined_guild', 'merge_request-review-requested', 'enrolled_into_instructor-led', 'co-authored-commit', 'signed-off-commit', 'comment-created', 'registered' + - `onlyContributions`: boolean, defaults to 1 (contributions only), set to 0 for all activities. - `granularity`: Optional string for time aggregation ('daily', 'weekly', 'monthly', 'quarterly', 'yearly') - Response: - Without granularity: `activityCount` (total count) - With granularity: `startDate`, `endDate`, and `activityCount` for each time period -TAGS "Widget", "Activity metrics" - NODE timeseries_generation_for_activity_count SQL > % diff --git a/services/libs/tinybird/pipes/activities_cumulative_count.pipe b/services/libs/tinybird/pipes/activities_cumulative_count.pipe index a3e65f618a..6eda60786e 100644 --- a/services/libs/tinybird/pipes/activities_cumulative_count.pipe +++ b/services/libs/tinybird/pipes/activities_cumulative_count.pipe @@ -3,6 +3,7 @@ DESCRIPTION > - **Key difference from `activities_count.pipe`:** Returns cumulative/running totals instead of period-specific counts, showing growth over time rather than activity per period. - **When `granularity` is NOT provided, returns a simple count from `activities_filtered_historical_cutoff`** (same behavior as `activities_filtered` for totals). - **When `granularity` is provided, returns time-series data** with cumulative activity counts calculated by adding historical baseline + period-specific counts. + - `onlyContributions`: **When filtering for non-contribution types (defined below), it needs to be set to 0**. - Cumulative calculation: historical total (from `activities_filtered_historical_cutoff`) + rolling sum of period activities to create running totals over time. - Each cumulative count includes all matching activities from project inception up to that time period. - Primary use case: showing growth trends and total activity accumulation over time in dashboard widgets. @@ -15,14 +16,14 @@ DESCRIPTION > - `platform`: Optional string filter for source platform (e.g., 'github', 'discord', 'slack') - `activity_type`: Optional string filter for single activity type (e.g., 'authored-commit') - `activity_types`: Optional array of activity types (e.g., ['authored-commit', 'co-authored-commit']) - - `onlyContributions`: Optional boolean, defaults to 1 (contributions only), set to 0 for all activities + - Supported contribution activity types (default when onlyContributions=1): 'pull_request-assigned', 'pull_request-reviewed', 'committed-commit', 'signed-off-commit', 'issues-closed', 'pull_request-merged', 'pull_request-review-requested', 'issue-comment', 'pull_request-comment', 'pull_request-review-thread-comment', 'authored-commit', 'pull_request-opened', 'star', 'fork', 'reaction', 'discussion-comment', 'message', 'member_leave', 'issues-opened', 'member_join', 'discussion-started', 'question', 'pull_request-closed', 'reviewed-commit', 'comment', 'reported-commit', 'co-authored-commit', 'post', 'answer', 'joined_guild', 'tested-commit', 'unstar', 'registered_as_attendee', 'channel_joined', 'registered_as_speaker', 'merge_request-closed', 'merge_request-opened', 'merge_request-review-approved', 'merge_request-assigned', 'approved-commit', 'informed-commit', 'influenced-commit', 'resolved-commit', 'page-created', 'page-updated', 'attachment-created', 'comment-created', 'enrolled_into_e-learning', 'issue-closed', 'issue-updated', 'enrolled_into_certification', 'certificate_issued', 'blogpost-created', 'blogpost-updated', 'hashtag', 'mention', 'issue-comment-created', 'changeset-abandoned', 'patchset-created', 'changeset-merged', 'changeset_comment-created', 'issue-comment-updated', 'patchset_comment-created', 'patchset_approval-created', 'issue-assigned', 'changeset-created', 'issue-created', 'issue-attachment-added', 'enrolled_into_instructor-led', 'create_topic', 'message_in_topic', 'merge_request-comment', 'merge_request-review-requested', 'issue-state-unknown', 'registered' + - Non-contribution activity types (require onlyContributions=0): 'star', 'pull_request-assigned', 'fork', 'pull_request-closed', 'pull_request-review-requested', 'enrolled_into_e-learning', 'enrolled_into_certification', 'certificate_issued', 'unstar', 'channel_joined', 'member_join', 'merge_request-assigned', 'authored-commit', 'member_leave', 'reaction', 'registered_as_attendee', 'committed-commit', 'changeset_comment-created', 'patchset-created', 'patchset_approval-created', 'patchset_comment-created', 'registered_as_speaker', 'joined_guild', 'merge_request-review-requested', 'enrolled_into_instructor-led', 'co-authored-commit', 'signed-off-commit', 'comment-created', 'registered' + - `onlyContributions`: Optional boolean, defaults to 1 (contributions only), set to 0 for all activities. - `granularity`: Optional string for time aggregation ('daily', 'weekly', 'monthly', 'quarterly', 'yearly') - Response: - Without granularity: Simple count from `activities_filtered_historical_cutoff` - With granularity: `startDate`, `endDate`, and `cumulativeActivityCount` for each time period -TAGS "Widget", "Cumulative metrics", "Growth trends" - NODE historical_activity_count SQL > % diff --git a/services/libs/tinybird/pipes/activities_daily_counts.pipe b/services/libs/tinybird/pipes/activities_daily_counts.pipe index 5d113a60c0..74b22da688 100644 --- a/services/libs/tinybird/pipes/activities_daily_counts.pipe +++ b/services/libs/tinybird/pipes/activities_daily_counts.pipe @@ -20,15 +20,17 @@ TAGS "Activity metrics" NODE daily_counts SQL > % - SELECT toStartOfDay(timestamp) AS date, uniqExact(activityId) AS count + SELECT toStartOfDay(timestamp) AS date, uniq(activityId) AS count FROM - activityRelations_deduplicated_ds + activityRelations_enriched_deduplicated_ds PREWHERE "segmentId" IN {{ Array(segmentIds, 'String', required=True, description="Segment IDs") }} {% if defined(after) %} AND timestamp >= parseDateTimeBestEffort({{ String(after) }}) {% end %} {% if defined(before) %} AND timestamp <= parseDateTimeBestEffort({{ String(before) }}) {% end %} - WHERE {% if defined(platform) %} platform = {{ String(platform) }} {% else %} 1 {% end %} + WHERE + snapshotId = (select max(snapshotId) from activityRelations_enriched_deduplicated_ds) + {% if defined(platform) %} AND platform = {{ String(platform) }} {% end %} GROUP BY date ORDER BY date ASC diff --git a/services/libs/tinybird/pipes/activities_deduplicated_copy_pipe_append_mode.pipe b/services/libs/tinybird/pipes/activities_deduplicated_copy_pipe_append_mode.pipe new file mode 100644 index 0000000000..277c4ddbc5 --- /dev/null +++ b/services/libs/tinybird/pipes/activities_deduplicated_copy_pipe_append_mode.pipe @@ -0,0 +1,23 @@ +NODE activities_deduplicated_copy_pipe_append_mode_0 +SQL > + SELECT + a.id, + a.timestamp, + a.platform, + a.type, + a.channel, + a.sourceId, + a.sentimentLabel, + a.score, + a.attributes, + a.body, + a.title, + a.url, + a.updatedAt + FROM activities a + WHERE a.updatedAt > (SELECT max("updatedAt") FROM activities_deduplicated_ds) + +TYPE COPY +TARGET_DATASOURCE activities_deduplicated_ds +COPY_MODE append +COPY_SCHEDULE 45 */2 * * * diff --git a/services/libs/tinybird/pipes/activities_filtered.pipe b/services/libs/tinybird/pipes/activities_filtered.pipe index 4be0ad2d4e..9c7a049ce7 100644 --- a/services/libs/tinybird/pipes/activities_filtered.pipe +++ b/services/libs/tinybird/pipes/activities_filtered.pipe @@ -17,14 +17,13 @@ DESCRIPTION > - This pipe is consumed by many of downstream pipes and widgets across the platform for consistent activity filtering. - Performance is optimized through proper sorting keys on `segmentId`, `timestamp`, `type`, `platform`, and `memberId` in the source datasource. -NODE activities_filtered_LAMBDA +NODE activities_filtered_bucket_routing SQL > % SELECT activityId as id, timestamp, type, platform, memberId, organizationId, segmentId - FROM activityRelations_deduplicated_cleaned_ds a + FROM activityRelations_bucket_routing a where - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) - AND segmentId = (SELECT segmentId FROM segments_filtered) + segmentId = (SELECT segmentId FROM segments_filtered) {% if defined(startDate) %} AND a.timestamp > {{ DateTime(startDate, description="Filter activity timestamp after", required=False) }} diff --git a/services/libs/tinybird/pipes/activities_filtered_historical_cutoff.pipe b/services/libs/tinybird/pipes/activities_filtered_historical_cutoff.pipe index d79503ec85..e53967fd8b 100644 --- a/services/libs/tinybird/pipes/activities_filtered_historical_cutoff.pipe +++ b/services/libs/tinybird/pipes/activities_filtered_historical_cutoff.pipe @@ -18,14 +18,13 @@ DESCRIPTION > - `includeCollaborations`: Optional boolean to include or exclude collaboration activities. Inherited from activityTypes_filtered. - Response: `id` (activityId), `timestamp`, `type`, `platform`, `memberId`, `organizationId`, `segmentId`. -NODE activities_filtered_historical_cutoff_LAMBDA +NODE activities_filtered_historical_cutoff_bucket_routing SQL > % SELECT activityId as id, timestamp, type, platform, memberId, organizationId, segmentId - FROM activityRelations_deduplicated_cleaned_ds a + FROM activityRelations_bucket_routing a where - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) - AND segmentId = (SELECT segmentId FROM segments_filtered) + segmentId = (SELECT segmentId FROM segments_filtered) {% if defined(startDate) %} AND a.timestamp <= {{ DateTime(startDate, description="Filter activity timestamp after", required=False) }} diff --git a/services/libs/tinybird/pipes/activities_filtered_retention.pipe b/services/libs/tinybird/pipes/activities_filtered_retention.pipe index 5b0966fb29..3170853d05 100644 --- a/services/libs/tinybird/pipes/activities_filtered_retention.pipe +++ b/services/libs/tinybird/pipes/activities_filtered_retention.pipe @@ -18,14 +18,13 @@ DESCRIPTION > - `granularity`: Required string for time aggregation and period extension ('daily', 'weekly', 'monthly', 'quarterly', 'yearly') - Response: `id` (activityId), `timestamp`, `type`, `platform`, `memberId`, `organizationId`, `segmentId`. -NODE activities_filtered_retention_LAMBDA +NODE activities_filtered_retention_bucket_routing SQL > % SELECT activityId as id, timestamp, type, platform, memberId, organizationId, segmentId - FROM activityRelations_deduplicated_cleaned_ds a + FROM activityRelations_bucket_routing a where - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) - AND segmentId = (SELECT segmentId FROM segments_filtered) + segmentId = (SELECT segmentId FROM segments_filtered) {% if defined(startDate) %} AND a.timestamp > {% if defined(granularity) and granularity == "daily" %} diff --git a/services/libs/tinybird/pipes/activities_relations_filtered.pipe b/services/libs/tinybird/pipes/activities_relations_filtered.pipe index 1a0af4d563..ba0b6cde73 100644 --- a/services/libs/tinybird/pipes/activities_relations_filtered.pipe +++ b/services/libs/tinybird/pipes/activities_relations_filtered.pipe @@ -2,6 +2,7 @@ DESCRIPTION > - `activities_enriched.pipe` provides a filtered and optionally paginated view of activity relations, then enriches those rows with content fields from the activities dataset. - This pipe powers endpoints that need both relation metadata and activity content (url/body/title) without performing a full join on large tables. - It filters from `activityRelations_deduplicated_ds` by project segment, time ranges, repositories, platforms, and activity types, then joins only the small, filtered set against `activities_deduplicated_ds`. + - By default, this pipe returns only contribution activities (`isContribution = 1`) unless explicitly overridden with `onlyContributions = 0`. - Security / scoping: if you use a `segments_filtered` pipe in your environment, replace the `segmentId` filter to read from it (see comment in Node 1). - Parameters: - `segments`: Optional array of segment IDs (e.g., ['7c3f6874-b10e-499b-a672-00281ab6c510']). If you use `segments_filtered`, remove this and rely on that pipe. @@ -14,12 +15,9 @@ DESCRIPTION > - `onlyContributions`: Optional boolean, defaults to 1 (contributions only), set to 0 for all activities. - `page`: Optional integer page index for pagination (OFFSET-based), defaults to 0. - `pageSize`: Optional integer page size, defaults to 10. - - `orderBy`: Optional enum ['timestamp_ASC','createdAt_ASC','createdAt_DESC'] else defaults to timestamp DESC. - - `searchTerm`: Optional case-insensitive search in channel/type/title/body. - - Dynamic OR blocks via `G1..G5_*` as in query below (include/exclude groups). - - Response (final node): all relation fields from Node 1 plus `url`, `body`, `title`, `attributes` from activities. + - Response (final node): all relation fields from Node 1 plus `url`, `body`, `title` from activities. - Performance: - - The enrichment only scans the subset of `activities_deduplicated_ds` whose `id` is found in the filtered (and paginated) set from Node 1, minimizing I/O. + - The enrichment only scans the subset of `activities_deduplicated_ds` whose `id` is found in the filtered page from Node 1, minimizing I/O. - Keep page sizes reasonable (50–200) for consistent latency. - Ensure `activityId` and `id` types are aligned (both UUID or both String). If they differ, this pipe casts to String at join time. @@ -46,9 +44,10 @@ SQL > ar.sourceParentId AS sourceParentId, ar.timestamp AS timestamp, ar.type AS type - FROM activityRelations_deduplicated_ds AS ar + FROM activityRelations_enriched_deduplicated_ds AS ar WHERE - (length(segments_arr) = 0 OR ar.segmentId IN segments_arr) + snapshotId = (select max(snapshotId) from activityRelations_enriched_deduplicated_ds) + AND (length(segments_arr) = 0 OR ar.segmentId IN segments_arr) {% if defined(startDate) %} AND ar.timestamp > parseDateTimeBestEffort({{ String(startDate) }}) {% end %} @@ -85,7 +84,7 @@ SQL > {% end %} ) {% end %} - -- ================== G1..G5 groups ================== + -- ================== G1..G5 (identici allo staging) ================== {% set has_g1 = 0 %} {% if defined(G1_memberIds) %} {% set has_g1 = 1 %} {% end %} {% if defined(G1_memberIds_exclude) %} {% set has_g1 = 1 %} {% end %} @@ -446,40 +445,29 @@ SQL > {% if defined(countOnly) %} {% if String(countOnly) == '1' or Int8(countOnly, 0) == 1 %} {% set is_count = 1 %} {% end %} {% end %} - WITH - {% if defined(segments) %} arrayDistinct({{ Array(segments, 'String') }}) AS segments_arr - {% else %} [] AS segments_arr - {% end %}, - base_ar AS ( - SELECT - ar.activityId AS id, - ar.channel, - ar.memberId, - ar.organizationId, - ar.platform, - ar.segmentId, - ar.sourceId, - ar.sourceParentId, - ar.timestamp, - ar.type - FROM activityRelations_deduplicated_ds AS ar - WHERE (length(segments_arr) = 0 OR ar.segmentId IN segments_arr) - ) - {% if is_count %} SELECT count() AS count FROM base_ar AS ar + {% if is_count %} SELECT count FROM filtered_relations {% else %} SELECT - ar.id, - ar.channel, - ar.memberId, - ar.organizationId, - ar.platform, - ar.segmentId, - ar.sourceId, - ar.sourceParentId, - ar.timestamp, - ar.type - FROM base_ar AS ar - ORDER BY ar.timestamp DESC, ar.id DESC - LIMIT {{ Int32(pageSize, 10) }} - OFFSET {{ Int32(page, 0) * Int32(pageSize, 10) }} + fr.id, + fr.channel, + fr.memberId, + fr.organizationId, + fr.platform, + fr.segmentId, + fr.sourceId, + fr.sourceParentId, + fr.timestamp, + fr.type, + a.attributes, + a.url, + a.body, + a.title + FROM filtered_relations AS fr ANY + LEFT JOIN + ( + SELECT CAST(id AS String) AS activity_id, attributes, url, body, title + FROM activities_deduplicated_ds + WHERE CAST(id AS String) IN (SELECT DISTINCT CAST(id AS String) FROM filtered_relations) + ) AS a + ON CAST(fr.id AS String) = a.activity_id {% end %} diff --git a/services/libs/tinybird/pipes/activityRelations_bucket_MV_0.pipe b/services/libs/tinybird/pipes/activityRelations_bucket_MV_0.pipe new file mode 100644 index 0000000000..f3f5e44e1d --- /dev/null +++ b/services/libs/tinybird/pipes/activityRelations_bucket_MV_0.pipe @@ -0,0 +1,6 @@ +NODE bucket_activityRelations_0 +SQL > + SELECT * FROM activityRelations WHERE cityHash64(segmentId) % 10 = 0 + +TYPE MATERIALIZED +DATASOURCE activityRelations_bucket_MV_ds_0 diff --git a/services/libs/tinybird/pipes/activityRelations_bucket_MV_1.pipe b/services/libs/tinybird/pipes/activityRelations_bucket_MV_1.pipe new file mode 100644 index 0000000000..b61e3386dd --- /dev/null +++ b/services/libs/tinybird/pipes/activityRelations_bucket_MV_1.pipe @@ -0,0 +1,6 @@ +NODE bucket_activityRelations_1 +SQL > + SELECT * FROM activityRelations WHERE cityHash64(segmentId) % 10 = 1 + +TYPE MATERIALIZED +DATASOURCE activityRelations_bucket_MV_ds_1 diff --git a/services/libs/tinybird/pipes/activityRelations_bucket_MV_2.pipe b/services/libs/tinybird/pipes/activityRelations_bucket_MV_2.pipe new file mode 100644 index 0000000000..24d4e70341 --- /dev/null +++ b/services/libs/tinybird/pipes/activityRelations_bucket_MV_2.pipe @@ -0,0 +1,6 @@ +NODE bucket_activityRelations_2 +SQL > + SELECT * FROM activityRelations WHERE cityHash64(segmentId) % 10 = 2 + +TYPE MATERIALIZED +DATASOURCE activityRelations_bucket_MV_ds_2 diff --git a/services/libs/tinybird/pipes/activityRelations_bucket_MV_3.pipe b/services/libs/tinybird/pipes/activityRelations_bucket_MV_3.pipe new file mode 100644 index 0000000000..2a12a745a6 --- /dev/null +++ b/services/libs/tinybird/pipes/activityRelations_bucket_MV_3.pipe @@ -0,0 +1,6 @@ +NODE bucket_activityRelations_3 +SQL > + SELECT * FROM activityRelations WHERE cityHash64(segmentId) % 10 = 3 + +TYPE MATERIALIZED +DATASOURCE activityRelations_bucket_MV_ds_3 diff --git a/services/libs/tinybird/pipes/activityRelations_bucket_MV_4.pipe b/services/libs/tinybird/pipes/activityRelations_bucket_MV_4.pipe new file mode 100644 index 0000000000..86eab9ffdc --- /dev/null +++ b/services/libs/tinybird/pipes/activityRelations_bucket_MV_4.pipe @@ -0,0 +1,6 @@ +NODE bucket_activityRelations_4 +SQL > + SELECT * FROM activityRelations WHERE cityHash64(segmentId) % 10 = 4 + +TYPE MATERIALIZED +DATASOURCE activityRelations_bucket_MV_ds_4 diff --git a/services/libs/tinybird/pipes/activityRelations_bucket_MV_5.pipe b/services/libs/tinybird/pipes/activityRelations_bucket_MV_5.pipe new file mode 100644 index 0000000000..11104c65df --- /dev/null +++ b/services/libs/tinybird/pipes/activityRelations_bucket_MV_5.pipe @@ -0,0 +1,6 @@ +NODE bucket_activityRelations_5 +SQL > + SELECT * FROM activityRelations WHERE cityHash64(segmentId) % 10 = 5 + +TYPE MATERIALIZED +DATASOURCE activityRelations_bucket_MV_ds_5 diff --git a/services/libs/tinybird/pipes/activityRelations_bucket_MV_6.pipe b/services/libs/tinybird/pipes/activityRelations_bucket_MV_6.pipe new file mode 100644 index 0000000000..a55dd4ef40 --- /dev/null +++ b/services/libs/tinybird/pipes/activityRelations_bucket_MV_6.pipe @@ -0,0 +1,6 @@ +NODE bucket_activityRelations_6 +SQL > + SELECT * FROM activityRelations WHERE cityHash64(segmentId) % 10 = 6 + +TYPE MATERIALIZED +DATASOURCE activityRelations_bucket_MV_ds_6 diff --git a/services/libs/tinybird/pipes/activityRelations_bucket_MV_7.pipe b/services/libs/tinybird/pipes/activityRelations_bucket_MV_7.pipe new file mode 100644 index 0000000000..ceb7d934a7 --- /dev/null +++ b/services/libs/tinybird/pipes/activityRelations_bucket_MV_7.pipe @@ -0,0 +1,6 @@ +NODE bucket_activityRelations_7 +SQL > + SELECT * FROM activityRelations WHERE cityHash64(segmentId) % 10 = 7 + +TYPE MATERIALIZED +DATASOURCE activityRelations_bucket_MV_ds_7 diff --git a/services/libs/tinybird/pipes/activityRelations_bucket_MV_8.pipe b/services/libs/tinybird/pipes/activityRelations_bucket_MV_8.pipe new file mode 100644 index 0000000000..3e79443f85 --- /dev/null +++ b/services/libs/tinybird/pipes/activityRelations_bucket_MV_8.pipe @@ -0,0 +1,6 @@ +NODE bucket_activityRelations_8 +SQL > + SELECT * FROM activityRelations WHERE cityHash64(segmentId) % 10 = 8 + +TYPE MATERIALIZED +DATASOURCE activityRelations_bucket_MV_ds_8 diff --git a/services/libs/tinybird/pipes/activityRelations_bucket_MV_9.pipe b/services/libs/tinybird/pipes/activityRelations_bucket_MV_9.pipe new file mode 100644 index 0000000000..f040786de7 --- /dev/null +++ b/services/libs/tinybird/pipes/activityRelations_bucket_MV_9.pipe @@ -0,0 +1,6 @@ +NODE bucket_activityRelations_9 +SQL > + SELECT * FROM activityRelations WHERE cityHash64(segmentId) % 10 = 9 + +TYPE MATERIALIZED +DATASOURCE activityRelations_bucket_MV_ds_9 diff --git a/services/libs/tinybird/pipes/activityRelations_bucket_MV_snapshot_0.pipe b/services/libs/tinybird/pipes/activityRelations_bucket_MV_snapshot_0.pipe new file mode 100644 index 0000000000..42dbd782ca --- /dev/null +++ b/services/libs/tinybird/pipes/activityRelations_bucket_MV_snapshot_0.pipe @@ -0,0 +1,8 @@ +NODE untitled_pipe_8603_0 +SQL > + SELECT * FROM activityRelations WHERE cityHash64(segmentId) % 10 = 0 + +TYPE COPY +TARGET_DATASOURCE activityRelations_bucket_MV_ds_0 +COPY_MODE append +COPY_SCHEDULE @on-demand diff --git a/services/libs/tinybird/pipes/activityRelations_bucket_MV_snapshot_1.pipe b/services/libs/tinybird/pipes/activityRelations_bucket_MV_snapshot_1.pipe new file mode 100644 index 0000000000..c6aa624a3b --- /dev/null +++ b/services/libs/tinybird/pipes/activityRelations_bucket_MV_snapshot_1.pipe @@ -0,0 +1,8 @@ +NODE untitled_pipe_8603_0 +SQL > + SELECT * FROM activityRelations WHERE cityHash64(segmentId) % 10 = 1 + +TYPE COPY +TARGET_DATASOURCE activityRelations_bucket_MV_ds_1 +COPY_MODE append +COPY_SCHEDULE @on-demand diff --git a/services/libs/tinybird/pipes/activityRelations_bucket_MV_snapshot_2.pipe b/services/libs/tinybird/pipes/activityRelations_bucket_MV_snapshot_2.pipe new file mode 100644 index 0000000000..e576b8125f --- /dev/null +++ b/services/libs/tinybird/pipes/activityRelations_bucket_MV_snapshot_2.pipe @@ -0,0 +1,8 @@ +NODE untitled_pipe_8603_0 +SQL > + SELECT * FROM activityRelations WHERE cityHash64(segmentId) % 10 = 2 + +TYPE COPY +TARGET_DATASOURCE activityRelations_bucket_MV_ds_2 +COPY_MODE append +COPY_SCHEDULE @on-demand diff --git a/services/libs/tinybird/pipes/activityRelations_bucket_MV_snapshot_3.pipe b/services/libs/tinybird/pipes/activityRelations_bucket_MV_snapshot_3.pipe new file mode 100644 index 0000000000..4a86df86aa --- /dev/null +++ b/services/libs/tinybird/pipes/activityRelations_bucket_MV_snapshot_3.pipe @@ -0,0 +1,8 @@ +NODE untitled_pipe_8603_0 +SQL > + SELECT * FROM activityRelations WHERE cityHash64(segmentId) % 10 = 3 + +TYPE COPY +TARGET_DATASOURCE activityRelations_bucket_MV_ds_3 +COPY_MODE append +COPY_SCHEDULE @on-demand diff --git a/services/libs/tinybird/pipes/activityRelations_bucket_MV_snapshot_4.pipe b/services/libs/tinybird/pipes/activityRelations_bucket_MV_snapshot_4.pipe new file mode 100644 index 0000000000..24c26ff3c8 --- /dev/null +++ b/services/libs/tinybird/pipes/activityRelations_bucket_MV_snapshot_4.pipe @@ -0,0 +1,8 @@ +NODE untitled_pipe_8603_0 +SQL > + SELECT * FROM activityRelations WHERE cityHash64(segmentId) % 10 = 4 + +TYPE COPY +TARGET_DATASOURCE activityRelations_bucket_MV_ds_4 +COPY_MODE append +COPY_SCHEDULE @on-demand diff --git a/services/libs/tinybird/pipes/activityRelations_bucket_MV_snapshot_5.pipe b/services/libs/tinybird/pipes/activityRelations_bucket_MV_snapshot_5.pipe new file mode 100644 index 0000000000..e60111bdfa --- /dev/null +++ b/services/libs/tinybird/pipes/activityRelations_bucket_MV_snapshot_5.pipe @@ -0,0 +1,8 @@ +NODE untitled_pipe_8603_0 +SQL > + SELECT * FROM activityRelations WHERE cityHash64(segmentId) % 10 = 5 + +TYPE COPY +TARGET_DATASOURCE activityRelations_bucket_MV_ds_5 +COPY_MODE append +COPY_SCHEDULE @on-demand diff --git a/services/libs/tinybird/pipes/activityRelations_bucket_MV_snapshot_6.pipe b/services/libs/tinybird/pipes/activityRelations_bucket_MV_snapshot_6.pipe new file mode 100644 index 0000000000..c3dc2e46e7 --- /dev/null +++ b/services/libs/tinybird/pipes/activityRelations_bucket_MV_snapshot_6.pipe @@ -0,0 +1,8 @@ +NODE untitled_pipe_8603_0 +SQL > + SELECT * FROM activityRelations WHERE cityHash64(segmentId) % 10 = 6 + +TYPE COPY +TARGET_DATASOURCE activityRelations_bucket_MV_ds_6 +COPY_MODE append +COPY_SCHEDULE @on-demand diff --git a/services/libs/tinybird/pipes/activityRelations_bucket_MV_snapshot_7.pipe b/services/libs/tinybird/pipes/activityRelations_bucket_MV_snapshot_7.pipe new file mode 100644 index 0000000000..7d1d756d18 --- /dev/null +++ b/services/libs/tinybird/pipes/activityRelations_bucket_MV_snapshot_7.pipe @@ -0,0 +1,8 @@ +NODE untitled_pipe_8603_0 +SQL > + SELECT * FROM activityRelations WHERE cityHash64(segmentId) % 10 = 7 + +TYPE COPY +TARGET_DATASOURCE activityRelations_bucket_MV_ds_7 +COPY_MODE append +COPY_SCHEDULE @on-demand diff --git a/services/libs/tinybird/pipes/activityRelations_bucket_MV_snapshot_8.pipe b/services/libs/tinybird/pipes/activityRelations_bucket_MV_snapshot_8.pipe new file mode 100644 index 0000000000..9410798121 --- /dev/null +++ b/services/libs/tinybird/pipes/activityRelations_bucket_MV_snapshot_8.pipe @@ -0,0 +1,8 @@ +NODE untitled_pipe_8603_0 +SQL > + SELECT * FROM activityRelations WHERE cityHash64(segmentId) % 10 = 8 + +TYPE COPY +TARGET_DATASOURCE activityRelations_bucket_MV_ds_8 +COPY_MODE append +COPY_SCHEDULE @on-demand diff --git a/services/libs/tinybird/pipes/activityRelations_bucket_MV_snapshot_9.pipe b/services/libs/tinybird/pipes/activityRelations_bucket_MV_snapshot_9.pipe new file mode 100644 index 0000000000..e97d9dafe0 --- /dev/null +++ b/services/libs/tinybird/pipes/activityRelations_bucket_MV_snapshot_9.pipe @@ -0,0 +1,8 @@ +NODE untitled_pipe_8603_0 +SQL > + SELECT * FROM activityRelations WHERE cityHash64(segmentId) % 10 = 9 + +TYPE COPY +TARGET_DATASOURCE activityRelations_bucket_MV_ds_9 +COPY_MODE append +COPY_SCHEDULE @on-demand diff --git a/services/libs/tinybird/pipes/activityRelations_enrich_clean_snapshot_MV.pipe b/services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_0.pipe similarity index 80% rename from services/libs/tinybird/pipes/activityRelations_enrich_clean_snapshot_MV.pipe rename to services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_0.pipe index bcf4e08c45..6b7eab25c9 100644 --- a/services/libs/tinybird/pipes/activityRelations_enrich_clean_snapshot_MV.pipe +++ b/services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_0.pipe @@ -11,7 +11,7 @@ SQL > multiSearchFirstIndexUTF8(search_u, names_u) AS idx, if(idx = 0, ('Unknown', 'XX', 0), mapping[idx]) AS country_data SELECT - activityRelations.*, + activityRelations_bucket_MV_ds_0.*, (gitInsertions + gitDeletions) as gitChangedLines, case when gitChangedLines > 0 and gitChangedLines < 10 @@ -28,9 +28,9 @@ SQL > end as "gitChangedLinesBucket", CAST(country_data .2 AS LowCardinality(String)) AS organizationCountryCode, o.displayName as "organizationName", - toStartOfInterval("updatedAt", INTERVAL 1 hour) + INTERVAL 1 hour as snapshotId - from activityRelations - left join organizations o final on o.id = activityRelations.organizationId + toStartOfInterval(now(), INTERVAL 1 hour) as snapshotId + from activityRelations_bucket_MV_ds_0 final + left join organizations o final on o.id = activityRelations_bucket_MV_ds_0.organizationId where memberId IN (SELECT id FROM members_sorted) and ( @@ -38,7 +38,7 @@ SQL > platform IN ('git', 'gerrit', 'github', 'gitlab') AND channel IN (SELECT arrayJoin(i.repositories) FROM insightsProjects i where isNull (i.deletedAt)) - AND activityRelations.segmentId IN ( + AND activityRelations_bucket_MV_ds_0.segmentId IN ( SELECT segmentId FROM segmentRepositories sr FINAL WHERE (sr.excluded IS NULL OR sr.excluded = false) @@ -47,5 +47,7 @@ SQL > OR platform NOT IN ('git', 'gerrit', 'github', 'gitlab') ) -TYPE MATERIALIZED -DATASOURCE activityRelations_enrich_clean_snapshot_MV_ds +TYPE COPY +TARGET_DATASOURCE activityRelations_deduplicated_cleaned_bucket_0_ds +COPY_MODE replace +COPY_SCHEDULE 10 * * * * diff --git a/services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_1.pipe b/services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_1.pipe new file mode 100644 index 0000000000..18f74e6649 --- /dev/null +++ b/services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_1.pipe @@ -0,0 +1,53 @@ +NODE country_mapping_array +SQL > + SELECT groupArray((country, country_code, timezone_offset)) AS country_data FROM country_mapping_ds + +NODE activityRelations_deduplicated_cleaned_denormalized +SQL > + WITH + upperUTF8(o.location) AS search_u, + (SELECT arrayMap(x -> upperUTF8(x .1), country_data) FROM country_mapping_array) AS names_u, + (SELECT country_data FROM country_mapping_array) AS mapping, + multiSearchFirstIndexUTF8(search_u, names_u) AS idx, + if(idx = 0, ('Unknown', 'XX', 0), mapping[idx]) AS country_data + SELECT + activityRelations_bucket_MV_ds_1.*, + (gitInsertions + gitDeletions) as gitChangedLines, + case + when gitChangedLines > 0 and gitChangedLines < 10 + then '1-9' + when gitChangedLines > 9 and gitChangedLines < 60 + then '10-59' + when gitChangedLines > 59 and gitChangedLines < 100 + then '60-99' + when gitChangedLines > 99 and gitChangedLines < 500 + then '100-499' + when gitChangedLines > 499 + then '500+' + else '' + end as "gitChangedLinesBucket", + CAST(country_data .2 AS LowCardinality(String)) AS organizationCountryCode, + o.displayName as "organizationName", + toStartOfInterval(now(), INTERVAL 1 hour) as snapshotId + from activityRelations_bucket_MV_ds_1 final + left join organizations o final on o.id = activityRelations_bucket_MV_ds_1.organizationId + where + memberId IN (SELECT id FROM members_sorted) + and ( + ( + platform IN ('git', 'gerrit', 'github', 'gitlab') + AND channel + IN (SELECT arrayJoin(i.repositories) FROM insightsProjects i where isNull (i.deletedAt)) + AND activityRelations_bucket_MV_ds_1.segmentId IN ( + SELECT segmentId + FROM segmentRepositories sr FINAL + WHERE (sr.excluded IS NULL OR sr.excluded = false) + ) + ) + OR platform NOT IN ('git', 'gerrit', 'github', 'gitlab') + ) + +TYPE COPY +TARGET_DATASOURCE activityRelations_deduplicated_cleaned_bucket_1_ds +COPY_MODE replace +COPY_SCHEDULE 10 * * * * diff --git a/services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_2.pipe b/services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_2.pipe new file mode 100644 index 0000000000..a4739c0d7c --- /dev/null +++ b/services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_2.pipe @@ -0,0 +1,53 @@ +NODE country_mapping_array +SQL > + SELECT groupArray((country, country_code, timezone_offset)) AS country_data FROM country_mapping_ds + +NODE activityRelations_deduplicated_cleaned_denormalized +SQL > + WITH + upperUTF8(o.location) AS search_u, + (SELECT arrayMap(x -> upperUTF8(x .1), country_data) FROM country_mapping_array) AS names_u, + (SELECT country_data FROM country_mapping_array) AS mapping, + multiSearchFirstIndexUTF8(search_u, names_u) AS idx, + if(idx = 0, ('Unknown', 'XX', 0), mapping[idx]) AS country_data + SELECT + activityRelations_bucket_MV_ds_2.*, + (gitInsertions + gitDeletions) as gitChangedLines, + case + when gitChangedLines > 0 and gitChangedLines < 10 + then '1-9' + when gitChangedLines > 9 and gitChangedLines < 60 + then '10-59' + when gitChangedLines > 59 and gitChangedLines < 100 + then '60-99' + when gitChangedLines > 99 and gitChangedLines < 500 + then '100-499' + when gitChangedLines > 499 + then '500+' + else '' + end as "gitChangedLinesBucket", + CAST(country_data .2 AS LowCardinality(String)) AS organizationCountryCode, + o.displayName as "organizationName", + toStartOfInterval(now(), INTERVAL 1 hour) as snapshotId + from activityRelations_bucket_MV_ds_2 final + left join organizations o final on o.id = activityRelations_bucket_MV_ds_2.organizationId + where + memberId IN (SELECT id FROM members_sorted) + and ( + ( + platform IN ('git', 'gerrit', 'github', 'gitlab') + AND channel + IN (SELECT arrayJoin(i.repositories) FROM insightsProjects i where isNull (i.deletedAt)) + AND activityRelations_bucket_MV_ds_2.segmentId IN ( + SELECT segmentId + FROM segmentRepositories sr FINAL + WHERE (sr.excluded IS NULL OR sr.excluded = false) + ) + ) + OR platform NOT IN ('git', 'gerrit', 'github', 'gitlab') + ) + +TYPE COPY +TARGET_DATASOURCE activityRelations_deduplicated_cleaned_bucket_2_ds +COPY_MODE replace +COPY_SCHEDULE 14 * * * * diff --git a/services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_3.pipe b/services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_3.pipe new file mode 100644 index 0000000000..e47c730b9f --- /dev/null +++ b/services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_3.pipe @@ -0,0 +1,53 @@ +NODE country_mapping_array +SQL > + SELECT groupArray((country, country_code, timezone_offset)) AS country_data FROM country_mapping_ds + +NODE activityRelations_deduplicated_cleaned_denormalized +SQL > + WITH + upperUTF8(o.location) AS search_u, + (SELECT arrayMap(x -> upperUTF8(x .1), country_data) FROM country_mapping_array) AS names_u, + (SELECT country_data FROM country_mapping_array) AS mapping, + multiSearchFirstIndexUTF8(search_u, names_u) AS idx, + if(idx = 0, ('Unknown', 'XX', 0), mapping[idx]) AS country_data + SELECT + activityRelations_bucket_MV_ds_3.*, + (gitInsertions + gitDeletions) as gitChangedLines, + case + when gitChangedLines > 0 and gitChangedLines < 10 + then '1-9' + when gitChangedLines > 9 and gitChangedLines < 60 + then '10-59' + when gitChangedLines > 59 and gitChangedLines < 100 + then '60-99' + when gitChangedLines > 99 and gitChangedLines < 500 + then '100-499' + when gitChangedLines > 499 + then '500+' + else '' + end as "gitChangedLinesBucket", + CAST(country_data .2 AS LowCardinality(String)) AS organizationCountryCode, + o.displayName as "organizationName", + toStartOfInterval(now(), INTERVAL 1 hour) as snapshotId + from activityRelations_bucket_MV_ds_3 final + left join organizations o final on o.id = activityRelations_bucket_MV_ds_3.organizationId + where + memberId IN (SELECT id FROM members_sorted) + and ( + ( + platform IN ('git', 'gerrit', 'github', 'gitlab') + AND channel + IN (SELECT arrayJoin(i.repositories) FROM insightsProjects i where isNull (i.deletedAt)) + AND activityRelations_bucket_MV_ds_3.segmentId IN ( + SELECT segmentId + FROM segmentRepositories sr FINAL + WHERE (sr.excluded IS NULL OR sr.excluded = false) + ) + ) + OR platform NOT IN ('git', 'gerrit', 'github', 'gitlab') + ) + +TYPE COPY +TARGET_DATASOURCE activityRelations_deduplicated_cleaned_bucket_3_ds +COPY_MODE replace +COPY_SCHEDULE 14 * * * * diff --git a/services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_4.pipe b/services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_4.pipe new file mode 100644 index 0000000000..f7a878335e --- /dev/null +++ b/services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_4.pipe @@ -0,0 +1,53 @@ +NODE country_mapping_array +SQL > + SELECT groupArray((country, country_code, timezone_offset)) AS country_data FROM country_mapping_ds + +NODE activityRelations_deduplicated_cleaned_denormalized +SQL > + WITH + upperUTF8(o.location) AS search_u, + (SELECT arrayMap(x -> upperUTF8(x .1), country_data) FROM country_mapping_array) AS names_u, + (SELECT country_data FROM country_mapping_array) AS mapping, + multiSearchFirstIndexUTF8(search_u, names_u) AS idx, + if(idx = 0, ('Unknown', 'XX', 0), mapping[idx]) AS country_data + SELECT + activityRelations_bucket_MV_ds_4.*, + (gitInsertions + gitDeletions) as gitChangedLines, + case + when gitChangedLines > 0 and gitChangedLines < 10 + then '1-9' + when gitChangedLines > 9 and gitChangedLines < 60 + then '10-59' + when gitChangedLines > 59 and gitChangedLines < 100 + then '60-99' + when gitChangedLines > 99 and gitChangedLines < 500 + then '100-499' + when gitChangedLines > 499 + then '500+' + else '' + end as "gitChangedLinesBucket", + CAST(country_data .2 AS LowCardinality(String)) AS organizationCountryCode, + o.displayName as "organizationName", + toStartOfInterval(now(), INTERVAL 1 hour) as snapshotId + from activityRelations_bucket_MV_ds_4 final + left join organizations o final on o.id = activityRelations_bucket_MV_ds_4.organizationId + where + memberId IN (SELECT id FROM members_sorted) + and ( + ( + platform IN ('git', 'gerrit', 'github', 'gitlab') + AND channel + IN (SELECT arrayJoin(i.repositories) FROM insightsProjects i where isNull (i.deletedAt)) + AND activityRelations_bucket_MV_ds_4.segmentId IN ( + SELECT segmentId + FROM segmentRepositories sr FINAL + WHERE (sr.excluded IS NULL OR sr.excluded = false) + ) + ) + OR platform NOT IN ('git', 'gerrit', 'github', 'gitlab') + ) + +TYPE COPY +TARGET_DATASOURCE activityRelations_deduplicated_cleaned_bucket_4_ds +COPY_MODE replace +COPY_SCHEDULE 18 * * * * diff --git a/services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_5.pipe b/services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_5.pipe new file mode 100644 index 0000000000..0f1a4d9aad --- /dev/null +++ b/services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_5.pipe @@ -0,0 +1,53 @@ +NODE country_mapping_array +SQL > + SELECT groupArray((country, country_code, timezone_offset)) AS country_data FROM country_mapping_ds + +NODE activityRelations_deduplicated_cleaned_denormalized +SQL > + WITH + upperUTF8(o.location) AS search_u, + (SELECT arrayMap(x -> upperUTF8(x .1), country_data) FROM country_mapping_array) AS names_u, + (SELECT country_data FROM country_mapping_array) AS mapping, + multiSearchFirstIndexUTF8(search_u, names_u) AS idx, + if(idx = 0, ('Unknown', 'XX', 0), mapping[idx]) AS country_data + SELECT + activityRelations_bucket_MV_ds_5.*, + (gitInsertions + gitDeletions) as gitChangedLines, + case + when gitChangedLines > 0 and gitChangedLines < 10 + then '1-9' + when gitChangedLines > 9 and gitChangedLines < 60 + then '10-59' + when gitChangedLines > 59 and gitChangedLines < 100 + then '60-99' + when gitChangedLines > 99 and gitChangedLines < 500 + then '100-499' + when gitChangedLines > 499 + then '500+' + else '' + end as "gitChangedLinesBucket", + CAST(country_data .2 AS LowCardinality(String)) AS organizationCountryCode, + o.displayName as "organizationName", + toStartOfInterval(now(), INTERVAL 1 hour) as snapshotId + from activityRelations_bucket_MV_ds_5 final + left join organizations o final on o.id = activityRelations_bucket_MV_ds_5.organizationId + where + memberId IN (SELECT id FROM members_sorted) + and ( + ( + platform IN ('git', 'gerrit', 'github', 'gitlab') + AND channel + IN (SELECT arrayJoin(i.repositories) FROM insightsProjects i where isNull (i.deletedAt)) + AND activityRelations_bucket_MV_ds_5.segmentId IN ( + SELECT segmentId + FROM segmentRepositories sr FINAL + WHERE (sr.excluded IS NULL OR sr.excluded = false) + ) + ) + OR platform NOT IN ('git', 'gerrit', 'github', 'gitlab') + ) + +TYPE COPY +TARGET_DATASOURCE activityRelations_deduplicated_cleaned_bucket_5_ds +COPY_MODE replace +COPY_SCHEDULE 18 * * * * diff --git a/services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_6.pipe b/services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_6.pipe new file mode 100644 index 0000000000..593176f467 --- /dev/null +++ b/services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_6.pipe @@ -0,0 +1,53 @@ +NODE country_mapping_array +SQL > + SELECT groupArray((country, country_code, timezone_offset)) AS country_data FROM country_mapping_ds + +NODE activityRelations_deduplicated_cleaned_denormalized +SQL > + WITH + upperUTF8(o.location) AS search_u, + (SELECT arrayMap(x -> upperUTF8(x .1), country_data) FROM country_mapping_array) AS names_u, + (SELECT country_data FROM country_mapping_array) AS mapping, + multiSearchFirstIndexUTF8(search_u, names_u) AS idx, + if(idx = 0, ('Unknown', 'XX', 0), mapping[idx]) AS country_data + SELECT + activityRelations_bucket_MV_ds_6.*, + (gitInsertions + gitDeletions) as gitChangedLines, + case + when gitChangedLines > 0 and gitChangedLines < 10 + then '1-9' + when gitChangedLines > 9 and gitChangedLines < 60 + then '10-59' + when gitChangedLines > 59 and gitChangedLines < 100 + then '60-99' + when gitChangedLines > 99 and gitChangedLines < 500 + then '100-499' + when gitChangedLines > 499 + then '500+' + else '' + end as "gitChangedLinesBucket", + CAST(country_data .2 AS LowCardinality(String)) AS organizationCountryCode, + o.displayName as "organizationName", + toStartOfInterval(now(), INTERVAL 1 hour) as snapshotId + from activityRelations_bucket_MV_ds_6 final + left join organizations o final on o.id = activityRelations_bucket_MV_ds_6.organizationId + where + memberId IN (SELECT id FROM members_sorted) + and ( + ( + platform IN ('git', 'gerrit', 'github', 'gitlab') + AND channel + IN (SELECT arrayJoin(i.repositories) FROM insightsProjects i where isNull (i.deletedAt)) + AND activityRelations_bucket_MV_ds_6.segmentId IN ( + SELECT segmentId + FROM segmentRepositories sr FINAL + WHERE (sr.excluded IS NULL OR sr.excluded = false) + ) + ) + OR platform NOT IN ('git', 'gerrit', 'github', 'gitlab') + ) + +TYPE COPY +TARGET_DATASOURCE activityRelations_deduplicated_cleaned_bucket_6_ds +COPY_MODE replace +COPY_SCHEDULE 22 * * * * diff --git a/services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_7.pipe b/services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_7.pipe new file mode 100644 index 0000000000..cdbc1883ba --- /dev/null +++ b/services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_7.pipe @@ -0,0 +1,53 @@ +NODE country_mapping_array +SQL > + SELECT groupArray((country, country_code, timezone_offset)) AS country_data FROM country_mapping_ds + +NODE activityRelations_deduplicated_cleaned_denormalized +SQL > + WITH + upperUTF8(o.location) AS search_u, + (SELECT arrayMap(x -> upperUTF8(x .1), country_data) FROM country_mapping_array) AS names_u, + (SELECT country_data FROM country_mapping_array) AS mapping, + multiSearchFirstIndexUTF8(search_u, names_u) AS idx, + if(idx = 0, ('Unknown', 'XX', 0), mapping[idx]) AS country_data + SELECT + activityRelations_bucket_MV_ds_7.*, + (gitInsertions + gitDeletions) as gitChangedLines, + case + when gitChangedLines > 0 and gitChangedLines < 10 + then '1-9' + when gitChangedLines > 9 and gitChangedLines < 60 + then '10-59' + when gitChangedLines > 59 and gitChangedLines < 100 + then '60-99' + when gitChangedLines > 99 and gitChangedLines < 500 + then '100-499' + when gitChangedLines > 499 + then '500+' + else '' + end as "gitChangedLinesBucket", + CAST(country_data .2 AS LowCardinality(String)) AS organizationCountryCode, + o.displayName as "organizationName", + toStartOfInterval(now(), INTERVAL 1 hour) as snapshotId + from activityRelations_bucket_MV_ds_7 final + left join organizations o final on o.id = activityRelations_bucket_MV_ds_7.organizationId + where + memberId IN (SELECT id FROM members_sorted) + and ( + ( + platform IN ('git', 'gerrit', 'github', 'gitlab') + AND channel + IN (SELECT arrayJoin(i.repositories) FROM insightsProjects i where isNull (i.deletedAt)) + AND activityRelations_bucket_MV_ds_7.segmentId IN ( + SELECT segmentId + FROM segmentRepositories sr FINAL + WHERE (sr.excluded IS NULL OR sr.excluded = false) + ) + ) + OR platform NOT IN ('git', 'gerrit', 'github', 'gitlab') + ) + +TYPE COPY +TARGET_DATASOURCE activityRelations_deduplicated_cleaned_bucket_7_ds +COPY_MODE replace +COPY_SCHEDULE 22 * * * * diff --git a/services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_8.pipe b/services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_8.pipe new file mode 100644 index 0000000000..1fb27ce592 --- /dev/null +++ b/services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_8.pipe @@ -0,0 +1,53 @@ +NODE country_mapping_array +SQL > + SELECT groupArray((country, country_code, timezone_offset)) AS country_data FROM country_mapping_ds + +NODE activityRelations_deduplicated_cleaned_denormalized +SQL > + WITH + upperUTF8(o.location) AS search_u, + (SELECT arrayMap(x -> upperUTF8(x .1), country_data) FROM country_mapping_array) AS names_u, + (SELECT country_data FROM country_mapping_array) AS mapping, + multiSearchFirstIndexUTF8(search_u, names_u) AS idx, + if(idx = 0, ('Unknown', 'XX', 0), mapping[idx]) AS country_data + SELECT + activityRelations_bucket_MV_ds_8.*, + (gitInsertions + gitDeletions) as gitChangedLines, + case + when gitChangedLines > 0 and gitChangedLines < 10 + then '1-9' + when gitChangedLines > 9 and gitChangedLines < 60 + then '10-59' + when gitChangedLines > 59 and gitChangedLines < 100 + then '60-99' + when gitChangedLines > 99 and gitChangedLines < 500 + then '100-499' + when gitChangedLines > 499 + then '500+' + else '' + end as "gitChangedLinesBucket", + CAST(country_data .2 AS LowCardinality(String)) AS organizationCountryCode, + o.displayName as "organizationName", + toStartOfInterval(now(), INTERVAL 1 hour) as snapshotId + from activityRelations_bucket_MV_ds_8 final + left join organizations o final on o.id = activityRelations_bucket_MV_ds_8.organizationId + where + memberId IN (SELECT id FROM members_sorted) + and ( + ( + platform IN ('git', 'gerrit', 'github', 'gitlab') + AND channel + IN (SELECT arrayJoin(i.repositories) FROM insightsProjects i where isNull (i.deletedAt)) + AND activityRelations_bucket_MV_ds_8.segmentId IN ( + SELECT segmentId + FROM segmentRepositories sr FINAL + WHERE (sr.excluded IS NULL OR sr.excluded = false) + ) + ) + OR platform NOT IN ('git', 'gerrit', 'github', 'gitlab') + ) + +TYPE COPY +TARGET_DATASOURCE activityRelations_deduplicated_cleaned_bucket_8_ds +COPY_MODE replace +COPY_SCHEDULE 26 * * * * diff --git a/services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_9.pipe b/services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_9.pipe new file mode 100644 index 0000000000..ecbab3fc43 --- /dev/null +++ b/services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_9.pipe @@ -0,0 +1,53 @@ +NODE country_mapping_array +SQL > + SELECT groupArray((country, country_code, timezone_offset)) AS country_data FROM country_mapping_ds + +NODE activityRelations_deduplicated_cleaned_denormalized +SQL > + WITH + upperUTF8(o.location) AS search_u, + (SELECT arrayMap(x -> upperUTF8(x .1), country_data) FROM country_mapping_array) AS names_u, + (SELECT country_data FROM country_mapping_array) AS mapping, + multiSearchFirstIndexUTF8(search_u, names_u) AS idx, + if(idx = 0, ('Unknown', 'XX', 0), mapping[idx]) AS country_data + SELECT + activityRelations_bucket_MV_ds_9.*, + (gitInsertions + gitDeletions) as gitChangedLines, + case + when gitChangedLines > 0 and gitChangedLines < 10 + then '1-9' + when gitChangedLines > 9 and gitChangedLines < 60 + then '10-59' + when gitChangedLines > 59 and gitChangedLines < 100 + then '60-99' + when gitChangedLines > 99 and gitChangedLines < 500 + then '100-499' + when gitChangedLines > 499 + then '500+' + else '' + end as "gitChangedLinesBucket", + CAST(country_data .2 AS LowCardinality(String)) AS organizationCountryCode, + o.displayName as "organizationName", + toStartOfInterval(now(), INTERVAL 1 hour) as snapshotId + from activityRelations_bucket_MV_ds_9 final + left join organizations o final on o.id = activityRelations_bucket_MV_ds_9.organizationId + where + memberId IN (SELECT id FROM members_sorted) + and ( + ( + platform IN ('git', 'gerrit', 'github', 'gitlab') + AND channel + IN (SELECT arrayJoin(i.repositories) FROM insightsProjects i where isNull (i.deletedAt)) + AND activityRelations_bucket_MV_ds_9.segmentId IN ( + SELECT segmentId + FROM segmentRepositories sr FINAL + WHERE (sr.excluded IS NULL OR sr.excluded = false) + ) + ) + OR platform NOT IN ('git', 'gerrit', 'github', 'gitlab') + ) + +TYPE COPY +TARGET_DATASOURCE activityRelations_deduplicated_cleaned_bucket_9_ds +COPY_MODE replace +COPY_SCHEDULE 26 * * * * diff --git a/services/libs/tinybird/pipes/activityRelations_bucket_routing.pipe b/services/libs/tinybird/pipes/activityRelations_bucket_routing.pipe new file mode 100644 index 0000000000..a9d642d367 --- /dev/null +++ b/services/libs/tinybird/pipes/activityRelations_bucket_routing.pipe @@ -0,0 +1,18 @@ +NODE activityRelations_bucket_routing_2 +SQL > + % + SELECT selected_bucket.* + FROM + {% if bucketId == '0' %} activityRelations_deduplicated_cleaned_bucket_0_ds + {% elif bucketId == '1' %} activityRelations_deduplicated_cleaned_bucket_1_ds + {% elif bucketId == '2' %} activityRelations_deduplicated_cleaned_bucket_2_ds + {% elif bucketId == '3' %} activityRelations_deduplicated_cleaned_bucket_3_ds + {% elif bucketId == '4' %} activityRelations_deduplicated_cleaned_bucket_4_ds + {% elif bucketId == '5' %} activityRelations_deduplicated_cleaned_bucket_5_ds + {% elif bucketId == '6' %} activityRelations_deduplicated_cleaned_bucket_6_ds + {% elif bucketId == '7' %} activityRelations_deduplicated_cleaned_bucket_7_ds + {% elif bucketId == '8' %} activityRelations_deduplicated_cleaned_bucket_8_ds + {% elif bucketId == '9' %} activityRelations_deduplicated_cleaned_bucket_9_ds + -- fallback, should never happen + {% else %} activityRelations_deduplicated_cleaned_bucket_0_ds + {% end %} as selected_bucket diff --git a/services/libs/tinybird/pipes/activityRelations_data_copilot.pipe b/services/libs/tinybird/pipes/activityRelations_data_copilot.pipe index 9054db1505..e85ec31180 100644 --- a/services/libs/tinybird/pipes/activityRelations_data_copilot.pipe +++ b/services/libs/tinybird/pipes/activityRelations_data_copilot.pipe @@ -52,5 +52,4 @@ SQL > gitChangedLinesBucket, organizationCountryCode, organizationName - FROM activityRelations_deduplicated_cleaned_ds - where snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) + FROM activityRelations_deduplicated_cleaned_bucket_union diff --git a/services/libs/tinybird/pipes/activityRelations_deduplicated_cleaned_bucket_union.pipe b/services/libs/tinybird/pipes/activityRelations_deduplicated_cleaned_bucket_union.pipe new file mode 100644 index 0000000000..10304b52d1 --- /dev/null +++ b/services/libs/tinybird/pipes/activityRelations_deduplicated_cleaned_bucket_union.pipe @@ -0,0 +1,31 @@ +NODE activityRelations_deduplicated_cleaned_ds_BUCKET_UNION_0 +SQL > + SELECT * + FROM activityRelations_deduplicated_cleaned_bucket_0_ds + UNION ALL + SELECT * + FROM activityRelations_deduplicated_cleaned_bucket_1_ds + UNION ALL + SELECT * + FROM activityRelations_deduplicated_cleaned_bucket_2_ds + UNION ALL + SELECT * + FROM activityRelations_deduplicated_cleaned_bucket_3_ds + UNION ALL + SELECT * + FROM activityRelations_deduplicated_cleaned_bucket_4_ds + UNION ALL + SELECT * + FROM activityRelations_deduplicated_cleaned_bucket_5_ds + UNION ALL + SELECT * + FROM activityRelations_deduplicated_cleaned_bucket_6_ds + UNION ALL + SELECT * + FROM activityRelations_deduplicated_cleaned_bucket_7_ds + UNION ALL + SELECT * + FROM activityRelations_deduplicated_cleaned_bucket_8_ds + UNION ALL + SELECT * + FROM activityRelations_deduplicated_cleaned_bucket_9_ds diff --git a/services/libs/tinybird/pipes/activityRelations_deduplicated_copy_pipe.pipe b/services/libs/tinybird/pipes/activityRelations_deduplicated_copy_pipe.pipe deleted file mode 100644 index e7ac394d0d..0000000000 --- a/services/libs/tinybird/pipes/activityRelations_deduplicated_copy_pipe.pipe +++ /dev/null @@ -1,30 +0,0 @@ -DESCRIPTION > - Deduplicates activityRelations and the destination datasource (activityRelations_deduplicated_ds) has optimized sorting key for merge operation that will follow - -NODE activityRelations_deduplicated -SQL > - SELECT - activityId, - conversationId, - createdAt, - updatedAt, - memberId, - objectMemberId, - objectMemberUsername, - organizationId, - parentId, - platform, - segmentId, - username, - channel, - isContribution, - sourceId, - sourceParentId, - timestamp, - type - FROM activityRelations final - -TYPE COPY -TARGET_DATASOURCE activityRelations_deduplicated_ds -COPY_MODE replace -COPY_SCHEDULE 0 * * * * diff --git a/services/libs/tinybird/pipes/activityRelations_enrich_initial_snapshot.pipe b/services/libs/tinybird/pipes/activityRelations_enrich_initial_snapshot.pipe new file mode 100644 index 0000000000..438ea83625 --- /dev/null +++ b/services/libs/tinybird/pipes/activityRelations_enrich_initial_snapshot.pipe @@ -0,0 +1,38 @@ +NODE country_mapping_array +SQL > + SELECT groupArray((country, country_code, timezone_offset)) AS country_data FROM country_mapping_ds + +NODE activityRelations_deduplicated_cleaned_denormalized +SQL > + WITH + upperUTF8(o.location) AS search_u, + (SELECT arrayMap(x -> upperUTF8(x .1), country_data) FROM country_mapping_array) AS names_u, + (SELECT country_data FROM country_mapping_array) AS mapping, + multiSearchFirstIndexUTF8(search_u, names_u) AS idx, + if(idx = 0, ('Unknown', 'XX', 0), mapping[idx]) AS country_data + SELECT + activityRelations.*, + (gitInsertions + gitDeletions) as gitChangedLines, + case + when gitChangedLines > 0 and gitChangedLines < 10 + then '1-9' + when gitChangedLines > 9 and gitChangedLines < 60 + then '10-59' + when gitChangedLines > 59 and gitChangedLines < 100 + then '60-99' + when gitChangedLines > 99 and gitChangedLines < 500 + then '100-499' + when gitChangedLines > 499 + then '500+' + else '' + end as "gitChangedLinesBucket", + CAST(country_data .2 AS LowCardinality(String)) AS organizationCountryCode, + o.displayName as "organizationName", + toStartOfInterval(now(), INTERVAL 1 day) as snapshotId + from activityRelations final + left join organizations o final on o.id = activityRelations.organizationId + +TYPE COPY +TARGET_DATASOURCE activityRelations_enriched_deduplicated_ds +COPY_MODE replace +COPY_SCHEDULE @on-demand diff --git a/services/libs/tinybird/pipes/activityRelations_enrich_snapshot_MV.pipe b/services/libs/tinybird/pipes/activityRelations_enrich_snapshot_MV.pipe new file mode 100644 index 0000000000..390cf0f55b --- /dev/null +++ b/services/libs/tinybird/pipes/activityRelations_enrich_snapshot_MV.pipe @@ -0,0 +1,36 @@ +NODE country_mapping_array +SQL > + SELECT groupArray((country, country_code, timezone_offset)) AS country_data FROM country_mapping_ds + +NODE activityRelations_deduplicated_cleaned_denormalized +SQL > + WITH + upperUTF8(o.location) AS search_u, + (SELECT arrayMap(x -> upperUTF8(x .1), country_data) FROM country_mapping_array) AS names_u, + (SELECT country_data FROM country_mapping_array) AS mapping, + multiSearchFirstIndexUTF8(search_u, names_u) AS idx, + if(idx = 0, ('Unknown', 'XX', 0), mapping[idx]) AS country_data + SELECT + activityRelations.*, + (gitInsertions + gitDeletions) as gitChangedLines, + case + when gitChangedLines > 0 and gitChangedLines < 10 + then '1-9' + when gitChangedLines > 9 and gitChangedLines < 60 + then '10-59' + when gitChangedLines > 59 and gitChangedLines < 100 + then '60-99' + when gitChangedLines > 99 and gitChangedLines < 500 + then '100-499' + when gitChangedLines > 499 + then '500+' + else '' + end as "gitChangedLinesBucket", + CAST(country_data .2 AS LowCardinality(String)) AS organizationCountryCode, + o.displayName as "organizationName", + toStartOfInterval("updatedAt", INTERVAL 1 DAY) + INTERVAL 1 day as snapshotId + from activityRelations + left join organizations o final on o.id = activityRelations.organizationId + +TYPE MATERIALIZED +DATASOURCE activityRelations_enrich_snapshot_MV_ds diff --git a/services/libs/tinybird/pipes/activityRelations_snapshot_merger_copy.pipe b/services/libs/tinybird/pipes/activityRelations_snapshot_merger_copy.pipe index 6b3c2fdfa9..9986be2413 100644 --- a/services/libs/tinybird/pipes/activityRelations_snapshot_merger_copy.pipe +++ b/services/libs/tinybird/pipes/activityRelations_snapshot_merger_copy.pipe @@ -3,24 +3,23 @@ DESCRIPTION > NODE realtime_snapshot SQL > - WITH (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) as maxSnapshotId - SELECT activityRelations_enrich_clean_snapshot_MV_ds.* - FROM activityRelations_enrich_clean_snapshot_MV_ds FINAL - where activityRelations_enrich_clean_snapshot_MV_ds.snapshotId = maxSnapshotId + INTERVAL 1 hour + WITH (select max(snapshotId) from activityRelations_enriched_deduplicated_ds) as maxSnapshotId + SELECT activityRelations_enrich_snapshot_MV_ds.* + FROM activityRelations_enrich_snapshot_MV_ds FINAL + where activityRelations_enrich_snapshot_MV_ds.snapshotId = maxSnapshotId + INTERVAL 1 day NODE historical_snapshot SQL > WITH ( - Select max(snapshotId)::DateTime from activityRelations_deduplicated_cleaned_ds + Select max(snapshotId)::DateTime from activityRelations_enriched_deduplicated_ds ) as maxSnapshotID - SELECT * REPLACE (maxSnapshotID::DateTime + interval 1 hour AS snapshotId) - FROM activityRelations_deduplicated_cleaned_ds + SELECT * REPLACE (maxSnapshotID::DateTime + interval 1 day AS snapshotId) + FROM activityRelations_enriched_deduplicated_ds where snapshotId = maxSnapshotID and (segmentId, timestamp, type, platform, channel, sourceId) NOT IN (SELECT segmentId, timestamp, type, platform, channel, sourceId FROM realtime_snapshot) - -- where (timestamp, activityId) NOT IN (SELECT timestamp, activityId FROM realtime_snapshot) NODE merged_snapshots SQL > @@ -31,9 +30,6 @@ SQL > from historical_snapshot TYPE COPY -TARGET_DATASOURCE activityRelations_deduplicated_cleaned_ds +TARGET_DATASOURCE activityRelations_enriched_deduplicated_ds COPY_MODE append -COPY_SCHEDULE 10 * * * * -NODE activityRelations_snapshot_merger_copy_3 -SQL > - SELECT count(*) FROM merged_snapshots +COPY_SCHEDULE 0 1 * * * diff --git a/services/libs/tinybird/pipes/activityTypes_by_project.pipe b/services/libs/tinybird/pipes/activityTypes_by_project.pipe index 270fbd3c14..d25eb29720 100644 --- a/services/libs/tinybird/pipes/activityTypes_by_project.pipe +++ b/services/libs/tinybird/pipes/activityTypes_by_project.pipe @@ -15,11 +15,10 @@ NODE activityTypes_by_project_0 SQL > % SELECT DISTINCT a.type as activityType, a.platform, at.label - FROM activityRelations_deduplicated_cleaned_ds a + FROM activityRelations_bucket_routing a INNER JOIN activityTypes at ON a.type = at.activityType AND a.platform = at.platform WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) - AND a.segmentId = (SELECT segmentId FROM segments_filtered) + a.segmentId = (SELECT segmentId FROM segments_filtered) {% if defined(repos) %} AND a.channel IN {{ Array(repos, 'String', description="Filter activity repo list", required=False) }} diff --git a/services/libs/tinybird/pipes/category_groups_list.pipe b/services/libs/tinybird/pipes/category_groups_list.pipe index 2af8f3a523..b36d802cd6 100644 --- a/services/libs/tinybird/pipes/category_groups_list.pipe +++ b/services/libs/tinybird/pipes/category_groups_list.pipe @@ -7,8 +7,6 @@ DESCRIPTION > - `slug`: Optional string to filter by specific category group slug - Response: `name`, `slug`, `type` -TAGS "API", "Category groups", "Navigation" - NODE category_groups_list_result SQL > % diff --git a/services/libs/tinybird/pipes/category_groups_oss_index.pipe b/services/libs/tinybird/pipes/category_groups_oss_index.pipe index a8b917be74..21f4fc27f7 100644 --- a/services/libs/tinybird/pipes/category_groups_oss_index.pipe +++ b/services/libs/tinybird/pipes/category_groups_oss_index.pipe @@ -10,8 +10,6 @@ DESCRIPTION > - `orderBy`: Optional string for sorting, defaults to 'totalContributors'. Available: 'totalContributors', 'softwareValue', 'avgScore' - Response: `id`, `name`, `type`, `slug`, `totalContributors`, `softwareValue`, `avgScore`, `topCollections` (array), `topProjects` (array) -TAGS "API", "OSS index", "Category groups", "Public directory" - NODE category_groups_oss_index_agregates DESCRIPTION > Returns aggregates such as top contributors, software value and average score by category group diff --git a/services/libs/tinybird/pipes/category_list.pipe b/services/libs/tinybird/pipes/category_list.pipe index 843f358f81..6704d3fae3 100644 --- a/services/libs/tinybird/pipes/category_list.pipe +++ b/services/libs/tinybird/pipes/category_list.pipe @@ -14,8 +14,6 @@ DESCRIPTION > - `page`: Optional integer for pagination offset calculation, defaults to 0 - Response: `id`, `name`, `slug`, `categoryGroupId`, `categoryGroupName`, `categoryGroupSlug`, `categoryGroupType` -TAGS "API", "Categories", "Pagination", "Category management" - NODE category_list_categories_deduplicated SQL > % diff --git a/services/libs/tinybird/pipes/collections_list.pipe b/services/libs/tinybird/pipes/collections_list.pipe index 042479e8d9..4db477a975 100644 --- a/services/libs/tinybird/pipes/collections_list.pipe +++ b/services/libs/tinybird/pipes/collections_list.pipe @@ -16,8 +16,6 @@ DESCRIPTION > - Count mode (`count=true`): `count` (total number of collections) - Data mode (default): `id`, `name`, `slug`, `description`, `projectCount`, `starred`, `softwareValue`, `contributorCount`, `featuredProjects` array -TAGS "API", "Collections", "Pagination", "Sorting" - NODE collections_paginated SQL > % diff --git a/services/libs/tinybird/pipes/contributions_with_local_time.pipe b/services/libs/tinybird/pipes/contributions_with_local_time.pipe index 6a619e76ed..d59a913874 100644 --- a/services/libs/tinybird/pipes/contributions_with_local_time.pipe +++ b/services/libs/tinybird/pipes/contributions_with_local_time.pipe @@ -25,11 +25,9 @@ SQL > mwli.country_data .3 as timezone_offset, toDayOfWeek(addHours(af.timestamp, mwli.country_data .3)) as weekday, intDiv(toHour(addHours(af.timestamp, mwli.country_data .3)), 2) * 2 AS two_hours_block - from activityRelations_deduplicated_cleaned_ds af + from activityRelations_deduplicated_cleaned_bucket_union af join members_with_location_information mwli on mwli.id = af.memberId - where - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) - AND platform in ('git', 'github', 'gitlab', 'gerrit') + where platform in ('git', 'github', 'gitlab', 'gerrit') TYPE COPY TARGET_DATASOURCE contributions_with_local_time_ds diff --git a/services/libs/tinybird/pipes/contributor_dependency.pipe b/services/libs/tinybird/pipes/contributor_dependency.pipe index 79037096ba..be50e0554c 100644 --- a/services/libs/tinybird/pipes/contributor_dependency.pipe +++ b/services/libs/tinybird/pipes/contributor_dependency.pipe @@ -16,8 +16,6 @@ DESCRIPTION > - `onlyContributions`: Optional boolean, defaults to 1 (contributions only), set to 0 for all activities - Response: `id`, `displayName`, `contributionCount`, `contributionPercentage`, `roles`, `contributionPercentageRunningTotal`, `totalContributorCount` -TAGS "Widget", "Dependency analysis", "Bus factor", "Risk assessment" - NODE contributions_percentage_running_total SQL > SELECT t.*, active_contributors.contributorCount as "totalContributorCount" diff --git a/services/libs/tinybird/pipes/contributor_retention.pipe b/services/libs/tinybird/pipes/contributor_retention.pipe index 7be7f702a9..c8fda499a2 100644 --- a/services/libs/tinybird/pipes/contributor_retention.pipe +++ b/services/libs/tinybird/pipes/contributor_retention.pipe @@ -18,8 +18,6 @@ DESCRIPTION > - `granularity`: Required string for time aggregation ('daily', 'weekly', 'monthly', 'quarterly', 'yearly') - Response: `startDate`, `endDate`, `retentionRate` (percentage of contributors retained from previous period) -TAGS "Widget", "Retention", "Contributors", "Cohort analysis" - NODE aggregated_members SQL > % diff --git a/services/libs/tinybird/pipes/contributors_geo_distribution.pipe b/services/libs/tinybird/pipes/contributors_geo_distribution.pipe index 98f9c194ef..bdd8984465 100644 --- a/services/libs/tinybird/pipes/contributors_geo_distribution.pipe +++ b/services/libs/tinybird/pipes/contributors_geo_distribution.pipe @@ -16,8 +16,6 @@ DESCRIPTION > - `onlyContributions`: Optional boolean, defaults to 1 (contributions only), set to 0 for all activities - Response: `country`, `flag`, `country_code`, `contributorCount`, `contributorPercentage` -TAGS "Widget", "Geography", "Contributors" - NODE country_mapping_array SQL > SELECT groupArray((country, flag, country_code)) AS country_data FROM country_mapping @@ -27,8 +25,9 @@ SQL > SELECT m.id, m.location, + m.country, arrayFilter( - x -> position(coalesce(nullIf(m.country, ''), m.location), upper(x .1)) > 0, + x -> position(upper(coalesce(nullIf(m.country, ''), m.location)), upper(x .1)) > 0, (SELECT country_data FROM country_mapping_array) ) AS matched_countries, arrayJoin( diff --git a/services/libs/tinybird/pipes/country_mapping.pipe b/services/libs/tinybird/pipes/country_mapping.pipe index 4757e123a8..d50541c39c 100644 --- a/services/libs/tinybird/pipes/country_mapping.pipe +++ b/services/libs/tinybird/pipes/country_mapping.pipe @@ -7,8 +7,6 @@ DESCRIPTION > - Parameters: None (returns static country mapping data) - Response: `country`, `flag` (emoji), `country_code` (ISO), `timezone_offset` (hours from UTC) -TAGS "Utility", "Country mapping", "Geo data", "Lookup table" - NODE map_country_name_flag_code SQL > SELECT diff --git a/services/libs/tinybird/pipes/generate_timeseries.pipe b/services/libs/tinybird/pipes/generate_timeseries.pipe index 644f79a221..0371727ec3 100644 --- a/services/libs/tinybird/pipes/generate_timeseries.pipe +++ b/services/libs/tinybird/pipes/generate_timeseries.pipe @@ -11,8 +11,6 @@ DESCRIPTION > - Inherits time range parameters from `generate_timeseries_bounds`: `startDate`, `endDate`, and activity filters - Response: `startDate`, `endDate` for each time period within the specified granularity and date range -TAGS "Utility", "Time-series", "Infrastructure", "Date generation" - NODE generate_timeseriez SQL > % diff --git a/services/libs/tinybird/pipes/generate_timeseries_bounds.pipe b/services/libs/tinybird/pipes/generate_timeseries_bounds.pipe index a9d37fff6a..24aae2740e 100644 --- a/services/libs/tinybird/pipes/generate_timeseries_bounds.pipe +++ b/services/libs/tinybird/pipes/generate_timeseries_bounds.pipe @@ -12,8 +12,6 @@ DESCRIPTION > - Inherits filtering parameters from `activities_filtered` pipe when calculating bounds from data - Response: `actual_start_date`, `actual_end_date` -TAGS "Utility", "Time-series", "Date bounds" - NODE generate_timeseries_boundz_0 SQL > % diff --git a/services/libs/tinybird/pipes/health_score_active_contributors.pipe b/services/libs/tinybird/pipes/health_score_active_contributors.pipe index fc90cef390..d1dfed680a 100644 --- a/services/libs/tinybird/pipes/health_score_active_contributors.pipe +++ b/services/libs/tinybird/pipes/health_score_active_contributors.pipe @@ -4,13 +4,12 @@ DESCRIPTION > SQL > % - SELECT segmentId, COALESCE(uniq(memberId), 0) AS activeContributors - FROM activityRelations_deduplicated_cleaned_ds - WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) - AND memberId != '' - AND (type, platform) IN (SELECT activityType, platform FROM activityTypes_filtered) - {% if defined(project) %} + {% if defined(project) %} + SELECT segmentId, COALESCE(uniq(memberId), 0) AS activeContributors + FROM activityRelations_bucket_routing + WHERE + memberId != '' + AND (type, platform) IN (SELECT activityType, platform FROM activityTypes_filtered) AND segmentId = (SELECT segmentId FROM segments_filtered) {% if defined(repos) %} AND channel @@ -24,11 +23,17 @@ SQL > AND timestamp < {{ DateTime(endDate, description="Filter before date", required=False) }} {% end %} - {% else %} + GROUP BY segmentId + {% else %} + SELECT segmentId, COALESCE(uniq(memberId), 0) AS activeContributors + FROM activityRelations_deduplicated_cleaned_bucket_union + WHERE + memberId != '' + AND (type, platform) IN (SELECT activityType, platform FROM activityTypes_filtered) AND timestamp >= toStartOfQuarter(now() - toIntervalQuarter(1)) AND timestamp < toStartOfQuarter(now()) - {% end %} - GROUP BY segmentId + GROUP BY segmentId + {% end %} NODE health_score_active_contributors_with_benchmark SQL > diff --git a/services/libs/tinybird/pipes/health_score_active_days.pipe b/services/libs/tinybird/pipes/health_score_active_days.pipe index 8063a609c7..6a88d2695e 100644 --- a/services/libs/tinybird/pipes/health_score_active_days.pipe +++ b/services/libs/tinybird/pipes/health_score_active_days.pipe @@ -1,12 +1,11 @@ NODE health_score_active_days_score SQL > % - SELECT segmentId, countDistinct(DATE(timestamp)) AS activeDaysCount - FROM activityRelations_deduplicated_cleaned_ds - WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) - {% if defined(project) %} - AND segmentId = (SELECT segmentId FROM segments_filtered) + {% if defined(project) %} + SELECT segmentId, countDistinct(DATE(timestamp)) AS activeDaysCount + FROM activityRelations_bucket_routing + WHERE + segmentId = (SELECT segmentId FROM segments_filtered) {% if defined(repos) %} AND channel IN {{ Array(repos, 'String', description="Filter activity repo list", required=False) }} @@ -19,11 +18,13 @@ SQL > AND timestamp < {{ DateTime(endDate, description="Filter before date", required=False) }} {% end %} - {% else %} - AND timestamp >= toStartOfDay(now() - toIntervalDay(365)) - AND timestamp < toStartOfDay(now()) - {% end %} - GROUP BY segmentId + GROUP BY segmentId + {% else %} + SELECT segmentId, countDistinct(DATE(timestamp)) AS activeDaysCount + FROM activityRelations_deduplicated_cleaned_bucket_union + WHERE timestamp >= toStartOfDay(now() - toIntervalDay(365)) AND timestamp < toStartOfDay(now()) + GROUP BY segmentId + {% end %} NODE health_score_active_days_with_benchmark SQL > diff --git a/services/libs/tinybird/pipes/health_score_contributor_dependency.pipe b/services/libs/tinybird/pipes/health_score_contributor_dependency.pipe index 6519df540c..30a279f866 100644 --- a/services/libs/tinybird/pipes/health_score_contributor_dependency.pipe +++ b/services/libs/tinybird/pipes/health_score_contributor_dependency.pipe @@ -1,13 +1,12 @@ NODE health_score_contributor_dependency_contribution_count SQL > % - SELECT segmentId, memberId, count() AS contributionCount, MIN(timestamp), MAX(timestamp) - FROM activityRelations_deduplicated_cleaned_ds - WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) - AND memberId != '' - AND (type, platform) IN (SELECT activityType, platform FROM activityTypes_filtered) - {% if defined(project) %} + {% if defined(project) %} + SELECT segmentId, memberId, count() AS contributionCount, MIN(timestamp), MAX(timestamp) + FROM activityRelations_bucket_routing + WHERE + memberId != '' + AND (type, platform) IN (SELECT activityType, platform FROM activityTypes_filtered) AND segmentId = (SELECT segmentId FROM segments_filtered) {% if defined(repos) %} AND channel @@ -21,12 +20,19 @@ SQL > AND timestamp < {{ DateTime(endDate, description="Filter before date", required=False) }} {% end %} - {% else %} + GROUP BY segmentId, memberId + ORDER by contributionCount DESC + {% else %} + SELECT segmentId, memberId, count() AS contributionCount, MIN(timestamp), MAX(timestamp) + FROM activityRelations_deduplicated_cleaned_bucket_union + WHERE + memberId != '' + AND (type, platform) IN (SELECT activityType, platform FROM activityTypes_filtered) AND timestamp >= toStartOfDay(now() - INTERVAL 365 DAY) AND timestamp < toStartOfDay(now() + INTERVAL 1 DAY) - {% end %} - GROUP BY segmentId, memberId - ORDER by contributionCount DESC + GROUP BY segmentId, memberId + ORDER by contributionCount DESC + {% end %} NODE health_score_contributor_dependency_contribution_percentage SQL > diff --git a/services/libs/tinybird/pipes/health_score_forks.pipe b/services/libs/tinybird/pipes/health_score_forks.pipe index 35e1e030ce..d6e5e2a74c 100644 --- a/services/libs/tinybird/pipes/health_score_forks.pipe +++ b/services/libs/tinybird/pipes/health_score_forks.pipe @@ -4,13 +4,11 @@ DESCRIPTION > SQL > % - SELECT segmentId, count() AS forks - FROM activityRelations_deduplicated_cleaned_ds - WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) - AND type = 'fork' - {% if defined(project) %} - AND segmentId = (SELECT segmentId FROM segments_filtered) + {% if defined(project) %} + SELECT segmentId, count() AS forks + FROM activityRelations_bucket_routing + WHERE + type = 'fork' AND segmentId = (SELECT segmentId FROM segments_filtered) {% if defined(repos) %} AND channel IN {{ Array(repos, 'String', description="Filter activity repo list", required=False) }} @@ -23,8 +21,13 @@ SQL > AND timestamp < {{ DateTime(endDate, description="Filter before date", required=False) }} {% end %} - {% end %} - GROUP BY segmentId + GROUP BY segmentId + {% else %} + SELECT segmentId, count() AS forks + FROM activityRelations_deduplicated_cleaned_bucket_union + WHERE type = 'fork' + GROUP BY segmentId + {% end %} NODE health_score_forks_with_benchmark SQL > diff --git a/services/libs/tinybird/pipes/health_score_issues_resolution.pipe b/services/libs/tinybird/pipes/health_score_issues_resolution.pipe index 4974081516..9547652be9 100644 --- a/services/libs/tinybird/pipes/health_score_issues_resolution.pipe +++ b/services/libs/tinybird/pipes/health_score_issues_resolution.pipe @@ -1,12 +1,11 @@ NODE health_score_issues_resolution_activities SQL > % - SELECT activityId as id, segmentId, channel as repo - FROM activityRelations_deduplicated_cleaned_ds - WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) - {% if defined(project) %} - AND segmentId = (SELECT segmentId FROM segments_filtered) + {% if defined(project) %} + SELECT activityId as id, segmentId, channel as repo + FROM activityRelations_bucket_routing + WHERE + segmentId = (SELECT segmentId FROM segments_filtered) {% if defined(repos) %} AND channel IN {{ Array(repos, 'String', description="Filter activity repo list", required=False) }} @@ -19,10 +18,13 @@ SQL > AND timestamp < {{ DateTime(endDate, description="Filter before date", required=False) }} {% end %} - {% else %} - AND timestamp >= toStartOfDay(now()) - INTERVAL 365 DAY + {% else %} + SELECT activityId as id, segmentId, channel as repo + FROM activityRelations_deduplicated_cleaned_bucket_union + WHERE + timestamp >= toStartOfDay(now()) - INTERVAL 365 DAY AND timestamp < toStartOfDay(now()) + INTERVAL 1 DAY - {% end %} + {% end %} NODE health_score_issues_resolution_score SQL > diff --git a/services/libs/tinybird/pipes/health_score_organization_dependency.pipe b/services/libs/tinybird/pipes/health_score_organization_dependency.pipe index df8d2cb5e2..c5737d0a8f 100644 --- a/services/libs/tinybird/pipes/health_score_organization_dependency.pipe +++ b/services/libs/tinybird/pipes/health_score_organization_dependency.pipe @@ -1,13 +1,12 @@ NODE health_score_organization_dependency_contribution_count SQL > % - SELECT segmentId, organizationId, count() AS contributionCount - FROM activityRelations_deduplicated_cleaned_ds - WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) - AND organizationId != '' - AND (type, platform) IN (SELECT activityType, platform FROM activityTypes_filtered) - {% if defined(project) %} + {% if defined(project) %} + SELECT segmentId, organizationId, count() AS contributionCount + FROM activityRelations_bucket_routing + WHERE + organizationId != '' + AND (type, platform) IN (SELECT activityType, platform FROM activityTypes_filtered) AND segmentId = (SELECT segmentId FROM segments_filtered) {% if defined(repos) %} AND channel @@ -21,11 +20,17 @@ SQL > AND timestamp < {{ DateTime(endDate, description="Filter before date", required=False) }} {% end %} - {% else %} + GROUP BY segmentId, organizationId + {% else %} + SELECT segmentId, organizationId, count() AS contributionCount + FROM activityRelations_deduplicated_cleaned_bucket_union + WHERE + organizationId != '' + AND (type, platform) IN (SELECT activityType, platform FROM activityTypes_filtered) AND timestamp >= toStartOfDay(now() - INTERVAL 365 DAY) AND timestamp < toStartOfDay(now() + INTERVAL 1 DAY) - {% end %} - GROUP BY segmentId, organizationId + GROUP BY segmentId, organizationId + {% end %} NODE health_score_organization_dependency_contribution_percentage SQL > diff --git a/services/libs/tinybird/pipes/health_score_overview.pipe b/services/libs/tinybird/pipes/health_score_overview.pipe index abbb9a4433..410964f46d 100644 --- a/services/libs/tinybird/pipes/health_score_overview.pipe +++ b/services/libs/tinybird/pipes/health_score_overview.pipe @@ -9,8 +9,6 @@ DESCRIPTION > - `slugs`: Optional array of project slugs for multi-project health comparison (e.g., ['k8s', 'tensorflow']) - Response: All health score fields from `health_score_copy_ds` including overall scores and component metrics -TAGS "Widget", "Health score", "Overview", "Project metrics" - NODE health_score_overview_result SQL > % diff --git a/services/libs/tinybird/pipes/health_score_pull_requests.pipe b/services/libs/tinybird/pipes/health_score_pull_requests.pipe index ef6445e761..d0a9b01833 100644 --- a/services/libs/tinybird/pipes/health_score_pull_requests.pipe +++ b/services/libs/tinybird/pipes/health_score_pull_requests.pipe @@ -4,14 +4,15 @@ DESCRIPTION > SQL > % - SELECT segmentId, count() AS pullRequests - FROM activityRelations_deduplicated_cleaned_ds - WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) - AND ( - type = 'pull_request-opened' OR type = 'merge_request-opened' OR type = 'changeset-created' - ) - {% if defined(project) %} + {% if defined(project) %} + SELECT segmentId, count() AS pullRequests + FROM activityRelations_bucket_routing + WHERE + ( + type = 'pull_request-opened' + OR type = 'merge_request-opened' + OR type = 'changeset-created' + ) AND segmentId = (SELECT segmentId FROM segments_filtered) {% if defined(repos) %} AND channel @@ -25,11 +26,20 @@ SQL > AND timestamp < {{ DateTime(endDate, description="Filter before date", required=False) }} {% end %} - {% else %} + GROUP BY segmentId + {% else %} + SELECT segmentId, count() AS pullRequests + FROM activityRelations_deduplicated_cleaned_bucket_union + WHERE + ( + type = 'pull_request-opened' + OR type = 'merge_request-opened' + OR type = 'changeset-created' + ) AND timestamp >= toStartOfDay(now() - toIntervalDay(365)) AND timestamp < toStartOfDay(now() + toIntervalDay(1)) - {% end %} - GROUP BY segmentId + GROUP BY segmentId + {% end %} NODE health_score_pull_requests_with_benchmark SQL > diff --git a/services/libs/tinybird/pipes/health_score_retention.pipe b/services/libs/tinybird/pipes/health_score_retention.pipe index d7d1fb3bd4..0e2d6a2a2f 100644 --- a/services/libs/tinybird/pipes/health_score_retention.pipe +++ b/services/libs/tinybird/pipes/health_score_retention.pipe @@ -1,63 +1,106 @@ NODE health_score_retention_current_quarter SQL > % - SELECT segmentId, groupUniqArray(memberId) AS currentQuarterMembers - FROM activityRelations_deduplicated_cleaned_ds - WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) - AND memberId != '' - {% if defined(project) %} - AND segmentId = (SELECT segmentId FROM segments_filtered) + {% if defined(project) %} + SELECT segmentId, groupUniqArray(memberId) AS currentQuarterMembers + FROM activityRelations_bucket_routing + WHERE + memberId != '' AND segmentId = (SELECT segmentId FROM segments_filtered) {% if defined(repos) %} AND channel IN {{ Array(repos, 'String', description="Filter activity repo list", required=False) }} {% end %} - {% end %} - {% if defined(endDate) %} - AND timestamp >= toStartOfQuarter( - parseDateTimeBestEffort( - {{ DateTime(endDate, description="Filter before date", required=False) }} + {% if defined(endDate) %} + AND timestamp >= toStartOfQuarter( + parseDateTimeBestEffort( + {{ DateTime(endDate, description="Filter before date", required=False) }} + ) + - INTERVAL 1 QUARTER ) - - INTERVAL 1 QUARTER - ) - AND timestamp < toStartOfQuarter( - parseDateTimeBestEffort( - {{ DateTime(endDate, description="Filter before date", required=False) }} + AND timestamp < toStartOfQuarter( + parseDateTimeBestEffort( + {{ DateTime(endDate, description="Filter before date", required=False) }} + ) ) - ) - {% else %} - AND timestamp >= toStartOfQuarter(now() - INTERVAL 1 QUARTER) - AND timestamp < toStartOfQuarter(now()) - {% end %} - GROUP BY segmentId + {% else %} + AND timestamp >= toStartOfQuarter(now() - INTERVAL 1 QUARTER) + AND timestamp < toStartOfQuarter(now()) + {% end %} + GROUP BY segmentId + {% else %} + SELECT segmentId, groupUniqArray(memberId) AS currentQuarterMembers + FROM activityRelations_deduplicated_cleaned_bucket_union + WHERE + memberId != '' + {% if defined(endDate) %} + AND timestamp >= toStartOfQuarter( + parseDateTimeBestEffort( + {{ DateTime(endDate, description="Filter before date", required=False) }} + ) + - INTERVAL 1 QUARTER + ) + AND timestamp < toStartOfQuarter( + parseDateTimeBestEffort( + {{ DateTime(endDate, description="Filter before date", required=False) }} + ) + ) + {% else %} + AND timestamp >= toStartOfQuarter(now() - INTERVAL 1 QUARTER) + AND timestamp < toStartOfQuarter(now()) + {% end %} + GROUP BY segmentId + {% end %} NODE health_score_retention_previous_quarter SQL > % - SELECT segmentId, groupUniqArray(memberId) AS previousQuarterMembers - FROM activityRelations_deduplicated_cleaned_ds - WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) - AND memberId != '' - {% if defined(project) %} AND segmentId = (SELECT segmentId FROM segments_filtered) {% end %} - {% if defined(endDate) %} - AND timestamp >= toStartOfQuarter( - parseDateTimeBestEffort( - {{ DateTime(endDate, description="Filter before date", required=False) }} + {% if defined(project) %} + SELECT segmentId, groupUniqArray(memberId) AS previousQuarterMembers + FROM activityRelations_bucket_routing + WHERE + memberId != '' AND segmentId = (SELECT segmentId FROM segments_filtered) + {% if defined(endDate) %} + AND timestamp >= toStartOfQuarter( + parseDateTimeBestEffort( + {{ DateTime(endDate, description="Filter before date", required=False) }} + ) + - INTERVAL 2 QUARTER ) - - INTERVAL 2 QUARTER - ) - AND timestamp < toStartOfQuarter( - parseDateTimeBestEffort( - {{ DateTime(endDate, description="Filter before date", required=False) }} + AND timestamp < toStartOfQuarter( + parseDateTimeBestEffort( + {{ DateTime(endDate, description="Filter before date", required=False) }} + ) + - INTERVAL 1 QUARTER ) - - INTERVAL 1 QUARTER - ) - {% else %} - AND timestamp >= toStartOfQuarter(now() - INTERVAL 2 QUARTER) - AND timestamp < toStartOfQuarter(now() - INTERVAL 1 QUARTER) - {% end %} - GROUP BY segmentId + {% else %} + AND timestamp >= toStartOfQuarter(now() - INTERVAL 2 QUARTER) + AND timestamp < toStartOfQuarter(now() - INTERVAL 1 QUARTER) + {% end %} + GROUP BY segmentId + {% else %} + SELECT segmentId, groupUniqArray(memberId) AS previousQuarterMembers + FROM activityRelations_deduplicated_cleaned_bucket_union + WHERE + memberId != '' + {% if defined(endDate) %} + AND timestamp >= toStartOfQuarter( + parseDateTimeBestEffort( + {{ DateTime(endDate, description="Filter before date", required=False) }} + ) + - INTERVAL 2 QUARTER + ) + AND timestamp < toStartOfQuarter( + parseDateTimeBestEffort( + {{ DateTime(endDate, description="Filter before date", required=False) }} + ) + - INTERVAL 1 QUARTER + ) + {% else %} + AND timestamp >= toStartOfQuarter(now() - INTERVAL 2 QUARTER) + AND timestamp < toStartOfQuarter(now() - INTERVAL 1 QUARTER) + {% end %} + GROUP BY segmentId + {% end %} NODE health_score_retention_counts SQL > diff --git a/services/libs/tinybird/pipes/health_score_stars.pipe b/services/libs/tinybird/pipes/health_score_stars.pipe index acb68557ef..441b3f891a 100644 --- a/services/libs/tinybird/pipes/health_score_stars.pipe +++ b/services/libs/tinybird/pipes/health_score_stars.pipe @@ -4,13 +4,11 @@ DESCRIPTION > SQL > % - SELECT segmentId, count() AS stars - FROM activityRelations_deduplicated_cleaned_ds - WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) - AND type = 'star' - {% if defined(project) %} - AND segmentId = (SELECT segmentId FROM segments_filtered) + {% if defined(project) %} + SELECT segmentId, count() AS stars + FROM activityRelations_bucket_routing + WHERE + type = 'star' AND segmentId = (SELECT segmentId FROM segments_filtered) {% if defined(repos) %} AND channel IN {{ Array(repos, 'String', description="Filter activity repo list", required=False) }} @@ -23,8 +21,13 @@ SQL > AND timestamp < {{ DateTime(endDate, description="Filter before date", required=False) }} {% end %} - {% end %} - GROUP BY segmentId + GROUP BY segmentId + {% else %} + SELECT segmentId, count() AS stars + FROM activityRelations_deduplicated_cleaned_bucket_union + WHERE type = 'star' + GROUP BY segmentId + {% end %} NODE health_score_stars_with_benchmark SQL > diff --git a/services/libs/tinybird/pipes/insights_projects_populated_copy.pipe b/services/libs/tinybird/pipes/insights_projects_populated_copy.pipe index 5bad1f5c1a..9404d1efda 100644 --- a/services/libs/tinybird/pipes/insights_projects_populated_copy.pipe +++ b/services/libs/tinybird/pipes/insights_projects_populated_copy.pipe @@ -39,11 +39,8 @@ DESCRIPTION > SQL > SELECT timestamp, segmentId - FROM activityRelations_deduplicated_cleaned_ds - WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) - AND type = 'authored-commit' - AND timestamp > toDateTime('1971-01-01') + FROM activityRelations_deduplicated_cleaned_bucket_union + WHERE type = 'authored-commit' AND timestamp > toDateTime('1971-01-01') NODE insights_projects_populated_copy_first_commit_by_project DESCRIPTION > diff --git a/services/libs/tinybird/pipes/issue_analysis_copy_pipe.pipe b/services/libs/tinybird/pipes/issue_analysis_copy_pipe.pipe index a850a7b2cb..9fb40e30de 100644 --- a/services/libs/tinybird/pipes/issue_analysis_copy_pipe.pipe +++ b/services/libs/tinybird/pipes/issue_analysis_copy_pipe.pipe @@ -4,30 +4,21 @@ DESCRIPTION > NODE issues_opened SQL > SELECT activityId as id, sourceId, timestamp AS openedAt, segmentId - FROM activityRelations_deduplicated_cleaned_ds - WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) - AND type = 'issues-opened' + FROM activityRelations_deduplicated_cleaned_bucket_union + WHERE type = 'issues-opened' NODE issues_closed SQL > SELECT sourceParentId, MIN(timestamp) AS closedAt - FROM activityRelations_deduplicated_cleaned_ds - WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) - AND type = 'issues-closed' - AND sourceParentId != '' + FROM activityRelations_deduplicated_cleaned_bucket_union + WHERE type = 'issues-closed' AND sourceParentId != '' GROUP BY sourceParentId NODE issues_comment SQL > SELECT sourceParentId, MIN(timestamp) AS commentedAt - FROM activityRelations_deduplicated_cleaned_ds - WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) - AND type = 'issue-comment' - AND sourceParentId != '' - AND toYear(timestamp) >= 1971 + FROM activityRelations_deduplicated_cleaned_bucket_union + WHERE type = 'issue-comment' AND sourceParentId != '' AND toYear(timestamp) >= 1971 GROUP BY sourceParentId NODE issue_analysis_results_merged diff --git a/services/libs/tinybird/pipes/issues_average_resolve_velocity.pipe b/services/libs/tinybird/pipes/issues_average_resolve_velocity.pipe index a2bd9765d3..f2296237bc 100644 --- a/services/libs/tinybird/pipes/issues_average_resolve_velocity.pipe +++ b/services/libs/tinybird/pipes/issues_average_resolve_velocity.pipe @@ -15,8 +15,6 @@ DESCRIPTION > - `onlyContributions`: Optional boolean, defaults to 1 (contributions only), set to 0 for all activities - Response: `averageIssueResolveVelocitySeconds` (average resolution time in seconds) -TAGS "Widget", "Issues", "Velocity metrics" - NODE average_issue_resolve_velocity_0 SQL > select round(avg(ia.closedInSeconds)) "averageIssueResolveVelocitySeconds" diff --git a/services/libs/tinybird/pipes/leaderboards_avg_commits_per_author.pipe b/services/libs/tinybird/pipes/leaderboards_avg_commits_per_author.pipe index d96116073c..4fee7b2ac5 100644 --- a/services/libs/tinybird/pipes/leaderboards_avg_commits_per_author.pipe +++ b/services/libs/tinybird/pipes/leaderboards_avg_commits_per_author.pipe @@ -19,10 +19,9 @@ DESCRIPTION > SQL > SELECT segmentId, count() as commits, uniq(memberId) as unique_authors - FROM activityRelations_deduplicated_cleaned_ds + FROM activityRelations_deduplicated_cleaned_bucket_union WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) - AND activityId != '' + activityId != '' AND (type = 'authored-commit' OR type = 'committed-commit') AND platform = 'git' GROUP BY segmentId diff --git a/services/libs/tinybird/pipes/leaderboards_codebase_size.pipe b/services/libs/tinybird/pipes/leaderboards_codebase_size.pipe index 935006671c..48c04dfb9b 100644 --- a/services/libs/tinybird/pipes/leaderboards_codebase_size.pipe +++ b/services/libs/tinybird/pipes/leaderboards_codebase_size.pipe @@ -19,11 +19,8 @@ DESCRIPTION > SQL > SELECT segmentId, SUM(gitInsertions) - SUM(gitDeletions) AS lineDifference - FROM activityRelations_deduplicated_cleaned_ds - WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) - AND platform = 'git' - AND (gitInsertions > 0 OR gitDeletions > 0) + FROM activityRelations_deduplicated_cleaned_bucket_union + WHERE platform = 'git' AND (gitInsertions > 0 OR gitDeletions > 0) GROUP BY segmentId NODE leaderboards_codebase_size_result diff --git a/services/libs/tinybird/pipes/leaderboards_commits.pipe b/services/libs/tinybird/pipes/leaderboards_commits.pipe index 1cbe44d3d2..069ee4c6c4 100644 --- a/services/libs/tinybird/pipes/leaderboards_commits.pipe +++ b/services/libs/tinybird/pipes/leaderboards_commits.pipe @@ -18,10 +18,9 @@ DESCRIPTION > SQL > SELECT segmentId, count() as commits - FROM activityRelations_deduplicated_cleaned_ds + FROM activityRelations_deduplicated_cleaned_bucket_union WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) - AND timestamp >= now() - INTERVAL 12 MONTH + timestamp >= now() - INTERVAL 12 MONTH AND timestamp < now() AND activityId != '' AND type = 'authored-commit' @@ -34,10 +33,9 @@ DESCRIPTION > SQL > SELECT segmentId, count(activityId) as commits - FROM activityRelations_deduplicated_cleaned_ds + FROM activityRelations_deduplicated_cleaned_bucket_union WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) - AND timestamp >= now() - INTERVAL 24 MONTH + timestamp >= now() - INTERVAL 24 MONTH AND timestamp < now() - INTERVAL 12 MONTH AND activityId != '' AND type = 'authored-commit' diff --git a/services/libs/tinybird/pipes/leaderboards_members.pipe b/services/libs/tinybird/pipes/leaderboards_members.pipe index 2d1a5c3eff..82730ed376 100644 --- a/services/libs/tinybird/pipes/leaderboards_members.pipe +++ b/services/libs/tinybird/pipes/leaderboards_members.pipe @@ -23,14 +23,10 @@ DESCRIPTION > SQL > SELECT memberId, count(*) AS memberActivityCount, groupUniqArray(segmentId) as segmentIds - FROM activityRelations_deduplicated_cleaned_ds ar + FROM activityRelations_deduplicated_cleaned_bucket_union ar INNER JOIN leaderboards_member_activity_types at ON ar.type = at.activityType AND ar.platform = at.platform - WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) - AND memberId != '' - AND timestamp >= now() - INTERVAL 12 MONTH - AND timestamp < now() + WHERE memberId != '' AND timestamp >= now() - INTERVAL 12 MONTH AND timestamp < now() GROUP BY memberId NODE leaderboards_members_previous_period @@ -39,12 +35,11 @@ DESCRIPTION > SQL > SELECT memberId, count(*) AS memberActivityCount - FROM activityRelations_deduplicated_cleaned_ds ar + FROM activityRelations_deduplicated_cleaned_bucket_union ar INNER JOIN leaderboards_member_activity_types at ON ar.type = at.activityType AND ar.platform = at.platform WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) - AND memberId != '' + memberId != '' AND timestamp >= now() - INTERVAL 24 MONTH AND timestamp < now() - INTERVAL 12 MONTH GROUP BY memberId diff --git a/services/libs/tinybird/pipes/leaderboards_organizations.pipe b/services/libs/tinybird/pipes/leaderboards_organizations.pipe index cbde1f2607..8bc740943e 100644 --- a/services/libs/tinybird/pipes/leaderboards_organizations.pipe +++ b/services/libs/tinybird/pipes/leaderboards_organizations.pipe @@ -24,16 +24,12 @@ DESCRIPTION > SQL > SELECT organizationId, count(*) AS organizationActivityCount, groupUniqArray(segmentId) as segmentIds - FROM activityRelations_deduplicated_cleaned_ds ar + FROM activityRelations_deduplicated_cleaned_bucket_union ar INNER JOIN leaderboards_organizations_activity_types at ON ar.type = at.activityType AND ar.platform = at.platform - WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) - AND organizationId != '' - AND timestamp >= now() - INTERVAL 12 MONTH - AND timestamp < now() + WHERE organizationId != '' AND timestamp >= now() - INTERVAL 12 MONTH AND timestamp < now() GROUP BY organizationId NODE leaderboards_organizations_previous_period @@ -42,14 +38,13 @@ DESCRIPTION > SQL > SELECT organizationId, count(*) AS organizationActivityCount - FROM activityRelations_deduplicated_cleaned_ds ar + FROM activityRelations_deduplicated_cleaned_bucket_union ar INNER JOIN leaderboards_organizations_activity_types at ON ar.type = at.activityType AND ar.platform = at.platform WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) - AND organizationId != '' + organizationId != '' AND timestamp >= now() - INTERVAL 24 MONTH AND timestamp < now() - INTERVAL 12 MONTH GROUP BY organizationId diff --git a/services/libs/tinybird/pipes/leaderboards_project_active_contributors.pipe b/services/libs/tinybird/pipes/leaderboards_project_active_contributors.pipe index 92a3409cbe..1fa965b680 100644 --- a/services/libs/tinybird/pipes/leaderboards_project_active_contributors.pipe +++ b/services/libs/tinybird/pipes/leaderboards_project_active_contributors.pipe @@ -25,15 +25,12 @@ DESCRIPTION > SQL > SELECT segmentId, uniq(memberId) as contributor_count - FROM activityRelations_deduplicated_cleaned_ds ar + FROM activityRelations_deduplicated_cleaned_bucket_union ar INNER JOIN leaderboards_project_active_contributors_activity_types at ON ar.type = at.activityType AND ar.platform = at.platform - WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) - AND timestamp >= now() - INTERVAL 12 MONTH - AND timestamp < now() + WHERE timestamp >= now() - INTERVAL 12 MONTH AND timestamp < now() GROUP BY segmentId NODE leaderboards_project_active_contributors_previous_period @@ -42,15 +39,12 @@ DESCRIPTION > SQL > SELECT segmentId, uniq(memberId) as contributor_count - FROM activityRelations_deduplicated_cleaned_ds ar + FROM activityRelations_deduplicated_cleaned_bucket_union ar INNER JOIN leaderboards_project_active_contributors_activity_types at ON ar.type = at.activityType AND ar.platform = at.platform - WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) - AND timestamp >= now() - INTERVAL 24 MONTH - AND timestamp < now() - INTERVAL 12 MONTH + WHERE timestamp >= now() - INTERVAL 24 MONTH AND timestamp < now() - INTERVAL 12 MONTH GROUP BY segmentId NODE leaderboards_project_active_contributors_results diff --git a/services/libs/tinybird/pipes/leaderboards_project_active_organizations.pipe b/services/libs/tinybird/pipes/leaderboards_project_active_organizations.pipe index 4cf0a96c27..fe0c71f041 100644 --- a/services/libs/tinybird/pipes/leaderboards_project_active_organizations.pipe +++ b/services/libs/tinybird/pipes/leaderboards_project_active_organizations.pipe @@ -25,15 +25,12 @@ DESCRIPTION > SQL > SELECT segmentId, uniq(organizationId) as organization_count - FROM activityRelations_deduplicated_cleaned_ds ar + FROM activityRelations_deduplicated_cleaned_bucket_union ar INNER JOIN leaderboards_project_active_organizations_activity_types at ON ar.type = at.activityType AND ar.platform = at.platform - WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) - AND timestamp >= now() - INTERVAL 12 MONTH - AND timestamp < now() + WHERE timestamp >= now() - INTERVAL 12 MONTH AND timestamp < now() GROUP BY segmentId NODE leaderboards_project_active_organizations_previous_period @@ -42,15 +39,12 @@ DESCRIPTION > SQL > SELECT segmentId, uniq(organizationId) as organization_count - FROM activityRelations_deduplicated_cleaned_ds ar + FROM activityRelations_deduplicated_cleaned_bucket_union ar INNER JOIN leaderboards_project_active_organizations_activity_types at ON ar.type = at.activityType AND ar.platform = at.platform - WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) - AND timestamp >= now() - INTERVAL 24 MONTH - AND timestamp < now() - INTERVAL 12 MONTH + WHERE timestamp >= now() - INTERVAL 24 MONTH AND timestamp < now() - INTERVAL 12 MONTH GROUP BY segmentId NODE leaderboards_project_active_organizations_results diff --git a/services/libs/tinybird/pipes/leaderboards_resolution_rate.pipe b/services/libs/tinybird/pipes/leaderboards_resolution_rate.pipe index 0f3e07acaa..5c5e207859 100644 --- a/services/libs/tinybird/pipes/leaderboards_resolution_rate.pipe +++ b/services/libs/tinybird/pipes/leaderboards_resolution_rate.pipe @@ -17,10 +17,8 @@ SQL > countIf(type = 'issues-opened') as issuesOpened, countIf(type = 'pull_request-merged') as prMerged, (prMerged / issuesOpened) as resolutionRate - FROM activityRelations_deduplicated_cleaned_ds - WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) - AND activityId != '' + FROM activityRelations_deduplicated_cleaned_bucket_union + WHERE activityId != '' GROUP BY segmentId HAVING issuesOpened > 0 diff --git a/services/libs/tinybird/pipes/leaderboards_small_project_commit.pipe b/services/libs/tinybird/pipes/leaderboards_small_project_commit.pipe index efcc580541..dbb4b7803c 100644 --- a/services/libs/tinybird/pipes/leaderboards_small_project_commit.pipe +++ b/services/libs/tinybird/pipes/leaderboards_small_project_commit.pipe @@ -19,12 +19,8 @@ DESCRIPTION > SQL > SELECT segmentId, count() as commits - FROM activityRelations_deduplicated_cleaned_ds - WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) - AND activityId != '' - AND type = 'authored-commit' - AND platform = 'git' + FROM activityRelations_deduplicated_cleaned_bucket_union + WHERE activityId != '' AND type = 'authored-commit' AND platform = 'git' GROUP BY segmentId NODE leaderboards_small_project_commit_results diff --git a/services/libs/tinybird/pipes/maintainers_roles_copy.pipe b/services/libs/tinybird/pipes/maintainers_roles_copy.pipe index f2062294b3..d23dcc1aed 100644 --- a/services/libs/tinybird/pipes/maintainers_roles_copy.pipe +++ b/services/libs/tinybird/pipes/maintainers_roles_copy.pipe @@ -19,8 +19,7 @@ SQL > memberId, argMax(organizationId, timestamp) AS organizationId, argMax(timestamp, timestamp) AS latestActivityTimestamp - FROM activityRelations_deduplicated_cleaned_ds - WHERE snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) + FROM activityRelations_deduplicated_cleaned_bucket_union GROUP BY memberId NODE maintainers_roles_copy_result diff --git a/services/libs/tinybird/pipes/members_public_names_copy_pipe.pipe b/services/libs/tinybird/pipes/members_public_names_copy_pipe.pipe index 2d0b37778c..02e54e32c5 100644 --- a/services/libs/tinybird/pipes/members_public_names_copy_pipe.pipe +++ b/services/libs/tinybird/pipes/members_public_names_copy_pipe.pipe @@ -14,10 +14,8 @@ SQL > CASE WHEN platform = 'github' THEN 2 WHEN platform = 'git' THEN 1 ELSE 0 END AS platform_priority - FROM activityRelations_deduplicated_cleaned_ds - WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) - AND segmentId IN (SELECT segmentId FROM segmentIds_in_nonlf_projects) + FROM activityRelations_deduplicated_cleaned_bucket_union + WHERE segmentId IN (SELECT segmentId FROM segmentIds_in_nonlf_projects) ) GROUP BY memberId diff --git a/services/libs/tinybird/pipes/members_sorted_copy_pipe.pipe b/services/libs/tinybird/pipes/members_sorted_copy_pipe.pipe index 3fad0e99c2..b4fd5dd3b8 100644 --- a/services/libs/tinybird/pipes/members_sorted_copy_pipe.pipe +++ b/services/libs/tinybird/pipes/members_sorted_copy_pipe.pipe @@ -10,7 +10,7 @@ SQL > not isBot and not isTeamMember and not isOrganization - and members.id in (select distinct memberId from activityRelations_deduplicated_ds) + and members.id in (select distinct memberId from activityRelations) TYPE COPY TARGET_DATASOURCE members_sorted diff --git a/services/libs/tinybird/pipes/monitoring_copy_pipes_spread_info.pipe b/services/libs/tinybird/pipes/monitoring_copy_pipes_spread_info.pipe new file mode 100644 index 0000000000..19ed9d6945 --- /dev/null +++ b/services/libs/tinybird/pipes/monitoring_copy_pipes_spread_info.pipe @@ -0,0 +1,140 @@ +NODE copy_jobs_with_duration +SQL > + SELECT + JSONExtract(jl.job_metadata, 'pipe_name', 'String') AS pipe_name, + jl.pipe_id, + jl.started_at, + ( + round( + JSONExtractFloat( + JSONExtractArrayRaw( + JSONExtractRaw(jl.job_metadata, 'dependent_datasources'), 'steps' + )[1], + 'elapsed_time' + ) + )::int + ) as duration_seconds, + (jl.started_at + interval duration_seconds seconds)::DateTime64(3) as ended_at, + jl.job_metadata + FROM tinybird.jobs_log AS jl + -- left JOIN tinybird.pipe_stats_rt AS ps + -- ON ps.pipe_id = jl.pipe_id + where jl.job_type = 'copy' and pipe_name <> '' + order by started_at desc + +NODE copy_pipes_spread_info_prometheus_style +SQL > + -- last 24 hours + -- 60s buckets + WITH now() AS now_ts, now_ts - 24 * 3600 AS window_start, 60 AS step + SELECT + 'tinybird_copy_jobs_running' AS name, + toFloat64(running_jobs) AS value, + 'gauge' AS type, + 'Concurrent Tinybird copy jobs per pipe per minute' AS help, + toUnixTimestamp(bucket_ts) AS timestamp, + map('pipe_name', pipe_name) AS labels + FROM + ( + SELECT pipe_name, toDateTime(bucket_ts) AS bucket_ts, count(*) AS running_jobs + FROM + ( + SELECT + pipe_name, + arrayJoin( + range( + intDiv( + greatest( + toUnixTimestamp(started_at), toUnixTimestamp(window_start) + ), + step + ), + intDiv(least(toUnixTimestamp(ended_at), toUnixTimestamp(now_ts)), step) + + 1 + ) + * step + ) AS bucket_ts + FROM copy_jobs_with_duration + WHERE ended_at >= window_start AND started_at <= now_ts + ) + GROUP BY pipe_name, bucket_ts + ) + ORDER BY name, pipe_name, timestamp + +NODE monitoring_copy_pipes_spread_info_2 +SQL > + SELECT + pipe_name, + pipe_id, + started_at, + duration_seconds, + ended_at, + start_minute + toIntervalMinute(minute_offset) AS minute_ts, + job_metadata + FROM + ( + SELECT + pipe_name, + pipe_id, + started_at, + duration_seconds, + ended_at, + job_metadata, + start_minute, + -- number of minutes from this job's start until the next job's start + greatest(dateDiff('minute', start_minute, next_start_minute), 0) AS minutes_span + FROM + ( + SELECT + JSONExtract(jl.job_metadata, 'pipe_name', 'String') AS pipe_name, + jl.pipe_id, + jl.started_at, + -- duration in seconds from job metadata + round( + JSONExtractFloat( + JSONExtractArrayRaw( + JSONExtractRaw(jl.job_metadata, 'dependent_datasources'), 'steps' + )[1], + 'elapsed_time' + ) + ) AS duration_seconds, + -- original end time (kept just for reference) + jl.started_at + toIntervalSecond(duration_seconds) AS ended_at, + jl.job_metadata, + -- minute bucket for this job's start + toStartOfMinute(jl.started_at) AS start_minute, + -- next job's start for the same pipe_name (window function) + coalesce( + toStartOfMinute( + leadInFrame(jl.started_at) OVER ( + PARTITION BY JSONExtract(jl.job_metadata, 'pipe_name', 'String') + ORDER BY jl.started_at ASC + ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING + ) + ), + -- for the last job of a pipe, fall back to its own ended_at + toStartOfMinute( + jl.started_at + toIntervalSecond( + round( + JSONExtractFloat( + JSONExtractArrayRaw( + JSONExtractRaw( + jl.job_metadata, 'dependent_datasources' + ), + 'steps' + )[1], + 'elapsed_time' + ) + ) + ) + ) + ) AS next_start_minute + FROM tinybird.jobs_log AS jl + WHERE + jl.job_type = 'copy' + AND JSONExtract(jl.job_metadata, 'pipe_name', 'String') <> '' + ) AS base_with_next + ) AS base_with_span + ARRAY + JOIN range(minutes_span) AS minute_offset -- notice: no +1 here + ORDER BY minute_ts DESC diff --git a/services/libs/tinybird/pipes/monitoring_entities.pipe b/services/libs/tinybird/pipes/monitoring_entities.pipe new file mode 100644 index 0000000000..ccc73b0465 --- /dev/null +++ b/services/libs/tinybird/pipes/monitoring_entities.pipe @@ -0,0 +1,84 @@ +TAGS "Monitoring" + +NODE activityRelations_total +SQL > + SELECT + 'rows_total' AS name, + 'Total rows in table' AS help, + 'gauge' AS type, + map('table', 'activityRelations') AS labels, + count() AS value + FROM activityRelations_enriched_deduplicated_ds + where snapshotId = (select max(snapshotId) from activityRelations_enriched_deduplicated_ds) + +NODE members_total +SQL > + SELECT + 'rows_total' AS name, + 'Total rows in table' AS help, + 'gauge' AS type, + map('table', 'members') AS labels, + count() AS value + FROM members_sorted + +NODE organizations_total +SQL > + SELECT + 'rows_total' AS name, + 'Total rows in table' AS help, + 'gauge' AS type, + map('table', 'organizations') AS labels, + count() AS value + FROM organizations final + +NODE lf_insightsProjects_total +SQL > + SELECT + 'rows_total' AS name, + 'Total rows in table' AS help, + 'gauge' AS type, + map('table', 'insightsProjects', 'isLf', 'true') AS labels, + count() AS value + FROM insightsProjects final + where enabled = 1 and isLF = 1 and segmentId <> '' + +NODE non_lf_insightsProjects_total +SQL > + SELECT + 'rows_total' AS name, + 'Total rows in table' AS help, + 'gauge' AS type, + map('table', 'insightsProjects', 'isLf', 'false') AS labels, + count() AS value + FROM insightsProjects final + where enabled = 1 and isLF = 0 and isNotNull(segmentId) and segmentId <> '' + +NODE segmentRepositories_total +SQL > + SELECT + 'rows_total' AS name, + 'Total rows in table' AS help, + 'gauge' AS type, + map('table', 'segmentRepositories') AS labels, + count() AS value + FROM segmentRepositories final + +NODE merge_results +SQL > + SELECT * + FROM activityRelations_total + UNION ALL + SELECT * + FROM members_total + UNION ALL + SELECT * + FROM organizations_total + UNION ALL + SELECT * + FROM lf_insightsProjects_total + UNION ALL + SELECT * + FROM non_lf_insightsProjects_total + UNION ALL + SELECT * + FROM segmentRepositories_total diff --git a/services/libs/tinybird/pipes/monitoring_long_running_endpoints.pipe b/services/libs/tinybird/pipes/monitoring_long_running_endpoints.pipe new file mode 100644 index 0000000000..9544123f1d --- /dev/null +++ b/services/libs/tinybird/pipes/monitoring_long_running_endpoints.pipe @@ -0,0 +1,10 @@ +NODE monitoring_long_running_endpoints_0 +SQL > + SELECT + pipe_name, + toStartOfHour(start_datetime) AS ts_hour, + 100.0 * countIf(duration > 1) / count() AS pct_over_1s, + 100.0 * countIf(duration > 2) / count() AS pct_over_2s + FROM tinybird.pipe_stats_rt + GROUP BY pipe_name, ts_hour + ORDER BY ts_hour ASC, pipe_name ASC diff --git a/services/libs/tinybird/pipes/org_dash_metric_copy_pipe.pipe b/services/libs/tinybird/pipes/org_dash_metric_copy_pipe.pipe index 7744b2ff90..01b8958607 100644 --- a/services/libs/tinybird/pipes/org_dash_metric_copy_pipe.pipe +++ b/services/libs/tinybird/pipes/org_dash_metric_copy_pipe.pipe @@ -16,10 +16,9 @@ SQL > type = 'pull_request-opened' OR type = 'merge_request-opened' OR type = 'changeset-created' ) as prsOpened, uniq(memberId) AS contributorCount - FROM activityRelations_deduplicated_cleaned_ds + FROM activityRelations_deduplicated_cleaned_bucket_union WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) - AND timestamp >= toStartOfDay(now() - toIntervalDay(365)) + timestamp >= toStartOfDay(now() - toIntervalDay(365)) AND timestamp < toStartOfDay(now() + toIntervalDay(1)) GROUP BY segmentId @@ -36,10 +35,9 @@ SQL > type = 'pull_request-opened' OR type = 'merge_request-opened' OR type = 'changeset-created' ) as orgPrsOpened, uniq(memberId) AS orgContributorCount - FROM activityRelations_deduplicated_cleaned_ds + FROM activityRelations_deduplicated_cleaned_bucket_union WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) - AND organizationId != '' + organizationId != '' AND timestamp >= toStartOfDay(now() - toIntervalDay(365)) AND timestamp < toStartOfDay(now() + toIntervalDay(1)) GROUP BY segmentId, organizationId diff --git a/services/libs/tinybird/pipes/organization_dependency.pipe b/services/libs/tinybird/pipes/organization_dependency.pipe index 50732430c3..3a1114eeff 100644 --- a/services/libs/tinybird/pipes/organization_dependency.pipe +++ b/services/libs/tinybird/pipes/organization_dependency.pipe @@ -16,8 +16,6 @@ DESCRIPTION > - `onlyContributions`: Optional boolean, defaults to 1 (contributions only), set to 0 for all activities - Response: `id`, `displayName`, `contributionPercentage`, `contributionPercentageRunningTotal`, `totalOrganizationCount` -TAGS "Widget", "Dependency analysis", "Institutional risk", "Organizations" - NODE organization_dependency_0 SQL > SELECT t.*, active_organizations.organizationCount as "totalOrganizationCount" diff --git a/services/libs/tinybird/pipes/organization_retention.pipe b/services/libs/tinybird/pipes/organization_retention.pipe index a9de385d0e..22b7a1fa80 100644 --- a/services/libs/tinybird/pipes/organization_retention.pipe +++ b/services/libs/tinybird/pipes/organization_retention.pipe @@ -18,8 +18,6 @@ DESCRIPTION > - `granularity`: Required string for time aggregation ('daily', 'weekly', 'monthly', 'quarterly', 'yearly') - Response: `startDate`, `endDate`, `retentionRate` (percentage of organizations retained from previous period) -TAGS "Widget", "Retention", "Organizations", "Cohort analysis" - NODE aggregated_organizations SQL > % diff --git a/services/libs/tinybird/pipes/organizations_geo_distribution.pipe b/services/libs/tinybird/pipes/organizations_geo_distribution.pipe index e65a46593b..bfbc0793b4 100644 --- a/services/libs/tinybird/pipes/organizations_geo_distribution.pipe +++ b/services/libs/tinybird/pipes/organizations_geo_distribution.pipe @@ -16,8 +16,6 @@ DESCRIPTION > - `onlyContributions`: Optional boolean, defaults to 1 (contributions only), set to 0 for all activities - Response: `country`, `flag`, `country_code`, `organizationCount`, `organizationPercentage` -TAGS "Widget", "Geography", "Organizations" - NODE country_mapping_array SQL > SELECT groupArray((country, flag, country_code)) AS country_data FROM country_mapping diff --git a/services/libs/tinybird/pipes/organizations_leaderboard.pipe b/services/libs/tinybird/pipes/organizations_leaderboard.pipe index 1af3604b42..6a96252aaa 100644 --- a/services/libs/tinybird/pipes/organizations_leaderboard.pipe +++ b/services/libs/tinybird/pipes/organizations_leaderboard.pipe @@ -20,8 +20,6 @@ DESCRIPTION > - Count mode (`count=true`): `count` (total number of organizations) - Data mode (default): `id`, `logo`, `displayName`, `contributionCount`, `contributionPercentage` -TAGS "Widget", "Leaderboard", "Organizations" - NODE organizations_leaderboard_paginated SQL > % diff --git a/services/libs/tinybird/pipes/package_metrics.pipe b/services/libs/tinybird/pipes/package_metrics.pipe index 1c6a0f16bf..3cca2958cc 100644 --- a/services/libs/tinybird/pipes/package_metrics.pipe +++ b/services/libs/tinybird/pipes/package_metrics.pipe @@ -17,8 +17,6 @@ DESCRIPTION > - Without granularity: `downloadsCount`, `dockerDownloadsCount`, `dockerDependentsCount`, `dependentPackagesCount`, `dependentReposCount` - With granularity: `startDate`, `endDate`, and all metric fields for each time period -TAGS "Widget", "Package metrics", "Downloads", "Dependencies" - NODE package_downloads_filtered SQL > % diff --git a/services/libs/tinybird/pipes/packages.pipe b/services/libs/tinybird/pipes/packages.pipe index 32da0faa3b..5f146fae8d 100644 --- a/services/libs/tinybird/pipes/packages.pipe +++ b/services/libs/tinybird/pipes/packages.pipe @@ -10,8 +10,6 @@ DESCRIPTION > - `repos`: Optional array of repository URLs to filter packages by specific repositories (e.g., ['https://github.com/kubernetes/kubernetes']) - Response: `repo`, `name`, `ecosystem` (distinct packages matching the filters) -TAGS "Utility", "Packages", "Package discovery" - NODE packages_0 SQL > % diff --git a/services/libs/tinybird/pipes/project_buckets.pipe b/services/libs/tinybird/pipes/project_buckets.pipe new file mode 100644 index 0000000000..4a584bd316 --- /dev/null +++ b/services/libs/tinybird/pipes/project_buckets.pipe @@ -0,0 +1,3 @@ +NODE get_segment_and_datasource +SQL > + SELECT cityHash64(segmentId) % 10 as bucketId FROM segments_filtered diff --git a/services/libs/tinybird/pipes/project_insights_copy.pipe b/services/libs/tinybird/pipes/project_insights_copy.pipe index 82809fa4d7..29ec737da8 100644 --- a/services/libs/tinybird/pipes/project_insights_copy.pipe +++ b/services/libs/tinybird/pipes/project_insights_copy.pipe @@ -20,10 +20,8 @@ SQL > uniq( CASE WHEN organizationId != '' THEN organizationId ELSE NULL END ) AS activeOrganizationsLast365Days - FROM activityRelations_deduplicated_cleaned_ds - WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) - AND timestamp <= now() + FROM activityRelations_deduplicated_cleaned_bucket_union + WHERE timestamp <= now() GROUP BY segmentId NODE project_insights_copy_previous_365_days_metrics @@ -39,10 +37,8 @@ SQL > uniq( CASE WHEN organizationId != '' THEN organizationId ELSE NULL END ) AS activeOrganizationsPrevious365Days - FROM activityRelations_deduplicated_cleaned_ds - WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) - AND timestamp < now() - INTERVAL 365 DAY + FROM activityRelations_deduplicated_cleaned_bucket_union + WHERE timestamp < now() - INTERVAL 365 DAY GROUP BY segmentId NODE project_insights_copy_results diff --git a/services/libs/tinybird/pipes/projects_list.pipe b/services/libs/tinybird/pipes/projects_list.pipe index 4e8b831bf2..9e6ad9712a 100644 --- a/services/libs/tinybird/pipes/projects_list.pipe +++ b/services/libs/tinybird/pipes/projects_list.pipe @@ -15,8 +15,6 @@ DESCRIPTION > - Count mode (`count=true`): `count` (total number of projects) - Data mode (default): All project fields from `insightsProjects_filtered` with sorting and pagination -TAGS "API", "Projects", "Pagination", "Sorting" - NODE projects_paginated SQL > % diff --git a/services/libs/tinybird/pipes/pull_request_analysis_baseline_merge_MV.pipe b/services/libs/tinybird/pipes/pull_request_analysis_baseline_merge_MV.pipe index d7a4507999..e94b6ead50 100644 --- a/services/libs/tinybird/pipes/pull_request_analysis_baseline_merge_MV.pipe +++ b/services/libs/tinybird/pipes/pull_request_analysis_baseline_merge_MV.pipe @@ -16,9 +16,9 @@ SQL > organizationId, platform, updatedAt - FROM activityRelations_enrich_clean_snapshot_MV_ds + FROM activityRelations_enrich_snapshot_MV_ds where - snapshotId = (select max(snapshotId) from activityRelations_enrich_clean_snapshot_MV_ds) + snapshotId = (select max(snapshotId) from activityRelations_enrich_snapshot_MV_ds) AND type in ( 'pull_request-opened', 'merge_request-opened', @@ -236,9 +236,9 @@ SQL > NULL ) ) AS resolvedAt - FROM activityRelations_enrich_clean_snapshot_MV_ds + FROM activityRelations_enrich_snapshot_MV_ds WHERE - snapshotId = (select max(snapshotId) from activityRelations_enrich_clean_snapshot_MV_ds) + snapshotId = (select max(snapshotId) from activityRelations_enrich_snapshot_MV_ds) AND type IN ( 'pull_request-opened', 'merge_request-opened', diff --git a/services/libs/tinybird/pipes/pull_request_analysis_baseline_merge_MV.pipe_bak b/services/libs/tinybird/pipes/pull_request_analysis_baseline_merge_MV.pipe_bak new file mode 100644 index 0000000000..23cd5dc02d --- /dev/null +++ b/services/libs/tinybird/pipes/pull_request_analysis_baseline_merge_MV.pipe_bak @@ -0,0 +1,398 @@ +DESCRIPTION > + Compacts activities from same PR into one, keeping necessary information in a single row. Helps to serve PR-wide widgets in the development tab. + Uses existing pull_requests_analyzed data as baseline and merges new events from activityRelations on top. + Only updates fields that have new data, preserving existing data for unchanged PRs. + + +NODE new_pull_request_related_activity +SQL > + + SELECT + activityId as id, + sourceId, + channel, + timestamp AS openedAt, + segmentId, + gitChangedLinesBucket, + memberId, + organizationId, + platform, + updatedAt + FROM activityRelations_enrich_snapshot_MV_ds + where + snapshotId = (select max(snapshotId) from activityRelations_enrich_snapshot_MV_ds) + AND type in ( + 'pull_request-opened', + 'merge_request-opened', + 'changeset-created', + 'pull_request-assigned', + 'merge_request-assigned', + 'pull_request-review-requested', + 'merge_request-review-requested', + 'pull_request-reviewed', + 'merge_request-review-changes-requested', + 'changeset_comment-created', + 'patchset_comment-created', + 'patchset-created', + 'pull_request-reviewed', + 'merge_request-review-approved', + 'patchset_approval-created', + 'pull_request-closed', + 'merge_request-closed', + 'changeset-closed', + 'changeset-abandoned', + 'pull_request-merged', + 'merge_request-merged', + 'changeset-merged' + ) + + + +NODE new_events_aggregated +DESCRIPTION > + Aggregates ONLY the new events from the latest snapshot in activityRelations. + This represents the "delta" - what's changed since the last update. + +SQL > + + SELECT + if( + type IN ('pull_request-opened', 'merge_request-opened', 'changeset-created'), + sourceId, + sourceParentId + ) AS prSourceId, + argMinIf( + activityId, + timestamp, + type IN ('pull_request-opened', 'merge_request-opened', 'changeset-created') + ) AS id, + argMinIf( + channel, + timestamp, + type IN ('pull_request-opened', 'merge_request-opened', 'changeset-created') + ) AS channel, + argMinIf( + segmentId, + timestamp, + type IN ('pull_request-opened', 'merge_request-opened', 'changeset-created') + ) AS segmentId, + argMinIf( + gitChangedLinesBucket, + timestamp, + type IN ('pull_request-opened', 'merge_request-opened', 'changeset-created') + ) AS gitChangedLinesBucket, + argMinIf( + memberId, + timestamp, + type IN ('pull_request-opened', 'merge_request-opened', 'changeset-created') + ) AS memberId, + argMinIf( + organizationId, + timestamp, + type IN ('pull_request-opened', 'merge_request-opened', 'changeset-created') + ) AS organizationId, + argMinIf( + platform, + timestamp, + type IN ('pull_request-opened', 'merge_request-opened', 'changeset-created') + ) AS platform, + minIf( + timestamp, type IN ('pull_request-opened', 'merge_request-opened', 'changeset-created') + ) AS openedAt, + argMinIf( + updatedAt, + timestamp, + type IN ('pull_request-opened', 'merge_request-opened', 'changeset-created') + ) AS openedUpdatedAt, + toInt64(countIf(type = 'patchset-created')) AS numberOfPatchsets, + argMin( + updatedAt, if(type IN ('pull_request-assigned', 'merge_request-assigned'), timestamp, NULL) + ) AS assignedUpdatedAt, + min( + if(type IN ('pull_request-assigned', 'merge_request-assigned'), timestamp, NULL) + ) AS assignedAt, + argMin( + updatedAt, + if( + type IN ('pull_request-review-requested', 'merge_request-review-requested'), + timestamp, + NULL + ) + ) AS reviewRequestedUpdatedAt, + min( + if( + type IN ('pull_request-review-requested', 'merge_request-review-requested'), + timestamp, + NULL + ) + ) AS reviewRequestedAt, + argMin( + updatedAt, + if( + type IN ( + 'pull_request-reviewed', + 'merge_request-review-changes-requested', + 'changeset_comment-created', + 'patchset_comment-created' + ), + timestamp, + NULL + ) + ) AS reviewedUpdatedAt, + min( + if( + type IN ( + 'pull_request-reviewed', + 'merge_request-review-changes-requested', + 'changeset_comment-created', + 'patchset_comment-created' + ), + timestamp, + NULL + ) + ) AS reviewedAt, + argMin( + updatedAt, + if( + (type = 'pull_request-reviewed' AND pullRequestReviewState = 'APPROVED') + OR type = 'merge_request-review-approved' + OR type = 'patchset_approval-created', + timestamp, + NULL + ) + ) AS approvedUpdatedAt, + min( + if( + (type = 'pull_request-reviewed' AND pullRequestReviewState = 'APPROVED') + OR type = 'merge_request-review-approved' + OR type = 'patchset_approval-created', + timestamp, + NULL + ) + ) AS approvedAt, + argMin( + updatedAt, + if( + type IN ( + 'pull_request-closed', + 'merge_request-closed', + 'changeset-closed', + 'changeset-abandoned' + ), + timestamp, + NULL + ) + ) AS closedUpdatedAt, + min( + if( + type IN ( + 'pull_request-closed', + 'merge_request-closed', + 'changeset-closed', + 'changeset-abandoned' + ), + timestamp, + NULL + ) + ) AS closedAt, + argMin( + updatedAt, + if( + type IN ('pull_request-merged', 'merge_request-merged', 'changeset-merged'), + timestamp, + NULL + ) + ) AS mergedUpdatedAt, + min( + if( + type IN ('pull_request-merged', 'merge_request-merged', 'changeset-merged'), + timestamp, + NULL + ) + ) AS mergedAt, + argMin( + updatedAt, + if( + type IN ( + 'pull_request-closed', + 'pull_request-merged', + 'merge_request-closed', + 'merge_request-merged', + 'changeset-merged', + 'changeset-closed', + 'changeset-abandoned' + ), + timestamp, + NULL + ) + ) AS resolvedUpdatedAt, + min( + if( + type IN ( + 'pull_request-closed', + 'pull_request-merged', + 'merge_request-closed', + 'merge_request-merged', + 'changeset-merged', + 'changeset-closed', + 'changeset-abandoned' + ), + timestamp, + NULL + ) + ) AS resolvedAt + FROM activityRelations_enrich_snapshot_MV_ds + WHERE + snapshotId = (select max(snapshotId) from activityRelations_enrich_snapshot_MV_ds) + AND type IN ( + 'pull_request-opened', + 'merge_request-opened', + 'changeset-created', + 'pull_request-assigned', + 'merge_request-assigned', + 'pull_request-review-requested', + 'merge_request-review-requested', + 'pull_request-reviewed', + 'merge_request-review-changes-requested', + 'changeset_comment-created', + 'patchset_comment-created', + 'patchset-created', + 'merge_request-review-approved', + 'patchset_approval-created', + 'pull_request-closed', + 'merge_request-closed', + 'changeset-closed', + 'changeset-abandoned', + 'pull_request-merged', + 'merge_request-merged', + 'changeset-merged' + ) + GROUP BY prSourceId + HAVING prSourceId != '' + + + +NODE pull_request_analysis_results_merged +DESCRIPTION > + Takes existing pull_requests_analyzed data as baseline (LEFT side). + Merges new event data on top using COALESCE - new data overwrites existing data if present. + For new PRs (not in baseline), uses the new data directly. + +SQL > + + SELECT + if(existing.id != '', existing.id, new.id) as id, + if(existing.sourceId != '', existing.sourceId, new.prSourceId) as sourceId, + COALESCE(existing.openedAt, new.openedAt) as openedAt, + if(existing.segmentId != '', existing.segmentId, new.segmentId) as segmentId, + if(existing.channel != '', existing.channel, new.channel) as channel, + if(existing.memberId != '', existing.memberId, new.memberId) as memberId, + if( + existing.organizationId != '', existing.organizationId, new.organizationId + ) as organizationId, + if( + existing.gitChangedLinesBucket != '', + existing.gitChangedLinesBucket, + new.gitChangedLinesBucket + ) as gitChangedLinesBucket, + COALESCE( + if( + new.assignedAt IS NOT NULL AND new.assignedAt != toDateTime(0), + least(existing.assignedAt, new.assignedAt), + existing.assignedAt + ), + new.assignedAt + ) AS assignedAt, + COALESCE( + if( + new.reviewRequestedAt IS NOT NULL AND new.reviewRequestedAt != toDateTime(0), + least(existing.reviewRequestedAt, new.reviewRequestedAt), + existing.reviewRequestedAt + ), + new.reviewRequestedAt + ) AS reviewRequestedAt, + COALESCE( + if( + new.reviewedAt IS NOT NULL AND new.reviewedAt != toDateTime(0), + least(existing.reviewedAt, new.reviewedAt), + existing.reviewedAt + ), + new.reviewedAt + ) AS reviewedAt, + COALESCE( + if( + new.approvedAt IS NOT NULL AND new.approvedAt != toDateTime(0), + least(existing.approvedAt, new.approvedAt), + existing.approvedAt + ), + new.approvedAt + ) AS approvedAt, + COALESCE( + if( + new.closedAt IS NOT NULL AND new.closedAt != toDateTime(0), + least(existing.closedAt, new.closedAt), + existing.closedAt + ), + new.closedAt + ) AS closedAt, + COALESCE( + if( + new.mergedAt IS NOT NULL AND new.mergedAt != toDateTime(0), + least(existing.mergedAt, new.mergedAt), + existing.mergedAt + ), + new.mergedAt + ) AS mergedAt, + COALESCE( + if( + new.resolvedAt IS NOT NULL AND new.resolvedAt != toDateTime(0), + least(existing.resolvedAt, new.resolvedAt), + existing.resolvedAt + ), + new.resolvedAt + ) AS resolvedAt, + IF( + assignedAt IS NULL, NULL, toUnixTimestamp(assignedAt) - toUnixTimestamp(openedAt) + ) AS assignedInSeconds, + IF( + reviewRequestedAt IS NULL, + NULL, + toUnixTimestamp(reviewRequestedAt) - toUnixTimestamp(openedAt) + ) AS reviewRequestedInSeconds, + IF( + reviewedAt IS NULL, NULL, toUnixTimestamp(reviewedAt) - toUnixTimestamp(openedAt) + ) AS reviewedInSeconds, + IF( + closedAt IS NULL, NULL, toUnixTimestamp(closedAt) - toUnixTimestamp(openedAt) + ) AS closedInSeconds, + IF( + mergedAt IS NULL, NULL, toUnixTimestamp(mergedAt) - toUnixTimestamp(openedAt) + ) AS mergedInSeconds, + IF( + resolvedAt IS NULL, NULL, toUnixTimestamp(resolvedAt) - toUnixTimestamp(openedAt) + ) AS resolvedInSeconds, + if(existing.platform != '', existing.platform, new.platform) as platform, + if( + new.numberOfPatchsets > 0, + COALESCE(existing.numberOfPatchsets, 0) + new.numberOfPatchsets, + existing.numberOfPatchsets + ) as numberOfPatchsets, + toStartOfInterval( + greatest( + COALESCE(new.openedUpdatedAt, existing.snapshotId, toDateTime(0)), + COALESCE(new.assignedUpdatedAt, toDateTime(0)), + COALESCE(new.reviewRequestedUpdatedAt, toDateTime(0)), + COALESCE(new.reviewedUpdatedAt, toDateTime(0)), + COALESCE(new.approvedUpdatedAt, toDateTime(0)), + COALESCE(new.closedUpdatedAt, toDateTime(0)), + COALESCE(new.mergedUpdatedAt, toDateTime(0)), + COALESCE(new.resolvedUpdatedAt, toDateTime(0)) + ), + INTERVAL 1 hour + ) + + INTERVAL 1 hour as snapshotId + FROM new_events_aggregated new + LEFT JOIN pull_requests_analyzed existing ON new.prSourceId = existing.sourceId + WHERE if(existing.id != '', existing.id, new.id) != '' + + diff --git a/services/libs/tinybird/pipes/pull_request_analysis_initial_snapshot.pipe b/services/libs/tinybird/pipes/pull_request_analysis_initial_snapshot.pipe index 969e129f79..5a5ec63b95 100644 --- a/services/libs/tinybird/pipes/pull_request_analysis_initial_snapshot.pipe +++ b/services/libs/tinybird/pipes/pull_request_analysis_initial_snapshot.pipe @@ -14,9 +14,9 @@ SQL > organizationId, platform, updatedAt - FROM activityRelations_deduplicated_cleaned_ds + FROM activityRelations_enriched_deduplicated_ds WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) + snapshotId = (select max(snapshotId) from activityRelations_enriched_deduplicated_ds) AND ( type = 'pull_request-opened' OR type = 'merge_request-opened' OR type = 'changeset-created' ) @@ -24,9 +24,9 @@ SQL > NODE pull_request_first_assigned SQL > SELECT sourceParentId, argMin(updatedAt, timestamp) AS updatedAt, min(timestamp) AS assignedAt - FROM activityRelations_deduplicated_cleaned_ds + FROM activityRelations_enriched_deduplicated_ds WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) + snapshotId = (select max(snapshotId) from activityRelations_enriched_deduplicated_ds) AND type IN ('pull_request-assigned', 'merge_request-assigned') GROUP BY sourceParentId ORDER BY assignedAt DESC @@ -35,18 +35,18 @@ NODE pull_request_first_review_requested SQL > SELECT sourceParentId, argMin(updatedAt, timestamp) AS updatedAt, MIN(timestamp) AS reviewRequestedAt - FROM activityRelations_deduplicated_cleaned_ds + FROM activityRelations_enriched_deduplicated_ds WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) + snapshotId = (select max(snapshotId) from activityRelations_enriched_deduplicated_ds) AND (type = 'pull_request-review-requested' OR type = 'merge_request-review-requested') GROUP BY sourceParentId NODE pull_request_first_reviewed SQL > SELECT sourceParentId, argMin(updatedAt, timestamp) AS updatedAt, MIN(timestamp) AS reviewedAt - FROM activityRelations_deduplicated_cleaned_ds + FROM activityRelations_enriched_deduplicated_ds WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) + snapshotId = (select max(snapshotId) from activityRelations_enriched_deduplicated_ds) AND sourceParentId <> '' and ( type = 'pull_request-reviewed' @@ -59,9 +59,9 @@ SQL > NODE pull_request_first_review_approved SQL > SELECT sourceParentId, argMin(updatedAt, timestamp) AS updatedAt, MIN(timestamp) AS approvedAt - FROM activityRelations_deduplicated_cleaned_ds + FROM activityRelations_enriched_deduplicated_ds WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) + snapshotId = (select max(snapshotId) from activityRelations_enriched_deduplicated_ds) AND ( (type = 'pull_request-reviewed' and pullRequestReviewState = 'APPROVED') OR type = 'merge_request-review-approved' @@ -72,9 +72,9 @@ SQL > NODE pull_request_first_closed SQL > SELECT sourceParentId, argMin(updatedAt, timestamp) AS updatedAt, MIN(timestamp) AS closedAt - FROM activityRelations_deduplicated_cleaned_ds + FROM activityRelations_enriched_deduplicated_ds WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) + snapshotId = (select max(snapshotId) from activityRelations_enriched_deduplicated_ds) AND ( type = 'pull_request-closed' OR type = 'merge_request-closed' @@ -89,18 +89,18 @@ DESCRIPTION > SQL > SELECT sourceParentId, argMin(updatedAt, timestamp) AS updatedAt, MIN(timestamp) AS mergedAt - FROM activityRelations_deduplicated_cleaned_ds + FROM activityRelations_enriched_deduplicated_ds WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) + snapshotId = (select max(snapshotId) from activityRelations_enriched_deduplicated_ds) AND (type = 'pull_request-merged' OR type = 'merge_request-merged' OR type = 'changeset-merged') GROUP BY sourceParentId NODE pull_request_first_resolved SQL > SELECT sourceParentId, argMin(updatedAt, timestamp) AS updatedAt, MIN(timestamp) AS resolvedAt - FROM activityRelations_deduplicated_cleaned_ds + FROM activityRelations_enriched_deduplicated_ds WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) + snapshotId = (select max(snapshotId) from activityRelations_enriched_deduplicated_ds) AND ( type = 'pull_request-closed' OR type = 'pull_request-merged' @@ -118,10 +118,11 @@ DESCRIPTION > SQL > SELECT sourceParentId, toInt64(COUNT(*)) AS numberOfPatchsets - FROM activityRelations_deduplicated_cleaned_ds + FROM activityRelations_enriched_deduplicated_ds WHERE - snapshotId = (select max(snapshotId) from activityRelations_deduplicated_cleaned_ds) + snapshotId = (select max(snapshotId) from activityRelations_enriched_deduplicated_ds) AND type = 'patchset-created' + AND sourceParentId != '' GROUP BY sourceParentId NODE pull_request_analysis_results_merged diff --git a/services/libs/tinybird/pipes/pull_requests_average_resolve_velocity.pipe b/services/libs/tinybird/pipes/pull_requests_average_resolve_velocity.pipe index 2e266555ed..3c743e4f77 100644 --- a/services/libs/tinybird/pipes/pull_requests_average_resolve_velocity.pipe +++ b/services/libs/tinybird/pipes/pull_requests_average_resolve_velocity.pipe @@ -15,8 +15,6 @@ DESCRIPTION > - `onlyContributions`: Optional boolean, defaults to 1 (contributions only), set to 0 for all activities - Response: `averagePullRequestResolveVelocitySeconds` (average resolution time in seconds) -TAGS "Widget", "Pull requests", "Velocity metrics" - NODE average_pull_request_resolve_velocity_0 SQL > select round(avg(pra.resolvedInSeconds)) "averagePullRequestResolveVelocitySeconds" diff --git a/services/libs/tinybird/pipes/pull_requests_average_time_to_first_review.pipe b/services/libs/tinybird/pipes/pull_requests_average_time_to_first_review.pipe index 906e4acac9..35d703333f 100644 --- a/services/libs/tinybird/pipes/pull_requests_average_time_to_first_review.pipe +++ b/services/libs/tinybird/pipes/pull_requests_average_time_to_first_review.pipe @@ -14,8 +14,6 @@ DESCRIPTION > - Without granularity: `averageTimeToFirstReviewSeconds` (total average) - With granularity: `startDate`, `endDate`, and `averageTimeToFirstReviewSeconds` for each time period -TAGS "Widget", "Pull requests", "Review time", "Development metrics" - NODE timeseries_generation_for_pr_avg_time_to_first_review SQL > % diff --git a/services/libs/tinybird/pipes/pull_requests_average_time_to_merge.pipe b/services/libs/tinybird/pipes/pull_requests_average_time_to_merge.pipe index c467908f8e..c7297b3fb2 100644 --- a/services/libs/tinybird/pipes/pull_requests_average_time_to_merge.pipe +++ b/services/libs/tinybird/pipes/pull_requests_average_time_to_merge.pipe @@ -14,8 +14,6 @@ DESCRIPTION > - Without granularity: `averageTimeToMergeSeconds` (total average) - With granularity: `startDate`, `endDate`, and `averageTimeToMergeSeconds` for each time period -TAGS "Widget", "Pull requests", "Merge time", "Development metrics" - NODE timeseries_generation_for_pr_avg_time_to_merge SQL > % @@ -41,26 +39,20 @@ SQL > where isNotNull(pra.mergedAt) {% if defined(startDate) %} - AND pra.mergedAt > parseDateTimeBestEffort( - {{ - DateTime( - startDate, - description="Filter pull request merged at after", - required=False, - ) - }} - ) + AND pra.mergedAt + > {{ + DateTime( + startDate, description="Filter pull request merged at after", required=False + ) + }} {% end %} {% if defined(endDate) %} - AND pra.mergedAt < parseDateTimeBestEffort( - {{ - DateTime( - endDate, - description="Filter pull request merged at before", - required=False, - ) - }} - ) + AND pra.mergedAt + < {{ + DateTime( + endDate, description="Filter pull request merged at before", required=False + ) + }} {% end %} GROUP BY ds."startDate", "endDate" ORDER BY ds."startDate" ASC diff --git a/services/libs/tinybird/pipes/pull_requests_filtered.pipe b/services/libs/tinybird/pipes/pull_requests_filtered.pipe index 5a3dbebd44..458d31b1de 100644 --- a/services/libs/tinybird/pipes/pull_requests_filtered.pipe +++ b/services/libs/tinybird/pipes/pull_requests_filtered.pipe @@ -10,8 +10,6 @@ DESCRIPTION > - `repos`: Optional array of repository URLs for filtering PRs by specific repositories (e.g., ['https://github.com/kubernetes/kubernetes']) - Response: All fields from `pull_requests_analyzed` including PR metadata, timing metrics, and calculated fields -TAGS "Infrastructure", "Pull requests", "Core filtering" - NODE pull_requests_filtered_0 SQL > % diff --git a/services/libs/tinybird/pipes/pull_requests_merge_lead_time.pipe b/services/libs/tinybird/pipes/pull_requests_merge_lead_time.pipe index 0633e331c0..2f6a9d4918 100644 --- a/services/libs/tinybird/pipes/pull_requests_merge_lead_time.pipe +++ b/services/libs/tinybird/pipes/pull_requests_merge_lead_time.pipe @@ -11,8 +11,6 @@ DESCRIPTION > - `endDate`: Optional DateTime filter for PRs opened before timestamp (e.g., '2024-12-31 23:59:59') - Response: `openedToMergedSeconds`, `openedToReviewAssignedSeconds`, `reviewAssignedToFirstReviewSeconds`, `firstReviewToApprovedSeconds`, `approvedToMergedSeconds` -TAGS "Widget", "Pull requests", "Lead time", "Development metrics" - NODE pull_requests_merge_lead_time_0 SQL > % diff --git a/services/libs/tinybird/pipes/pull_requests_review_time_by_size.pipe b/services/libs/tinybird/pipes/pull_requests_review_time_by_size.pipe index efcd709e62..67908bf639 100644 --- a/services/libs/tinybird/pipes/pull_requests_review_time_by_size.pipe +++ b/services/libs/tinybird/pipes/pull_requests_review_time_by_size.pipe @@ -11,8 +11,6 @@ DESCRIPTION > - `endDate`: Optional DateTime filter for PRs reviewed before timestamp (e.g., '2024-12-31 23:59:59') - Response: `gitChangedLinesBucket`, `reviewedInSecondsAvg`, `pullRequestCount` ordered by size bucket -TAGS "Widget", "Pull requests", "Review time", "Size analysis" - NODE review_time_by_pull_request_size_0 SQL > % diff --git a/services/libs/tinybird/pipes/repository_groups_list.pipe b/services/libs/tinybird/pipes/repository_groups_list.pipe index 97ab16c6e1..6b1d7d9711 100644 --- a/services/libs/tinybird/pipes/repository_groups_list.pipe +++ b/services/libs/tinybird/pipes/repository_groups_list.pipe @@ -1,5 +1,5 @@ NODE repository_groups_list_repositories SQL > SELECT name, slug, repositories - FROM repositoryGroups + FROM repositoryGroups FINAL WHERE deletedAt IS NULL AND insightsProjectId = (SELECT insightsProjectId FROM segments_filtered) diff --git a/services/libs/tinybird/pipes/review_efficiency.pipe b/services/libs/tinybird/pipes/review_efficiency.pipe new file mode 100644 index 0000000000..585437ad6a --- /dev/null +++ b/services/libs/tinybird/pipes/review_efficiency.pipe @@ -0,0 +1,121 @@ +DESCRIPTION > + - `review_efficiency.pipe` serves the "Review Efficiency" widget in the Development tab. + - Tracks opened and merged pull requests over time to measure code review efficiency. + - **When `granularity` is NOT provided, returns a single set of KPI values** (openedCount, mergedCount) across the entire time range. + - **When `granularity` is provided, returns time-series data** showing opened vs merged PRs aggregated by different time periods (daily, weekly, monthly, quarterly, yearly). + - Uses `generate_timeseries` pipe to create consistent time periods and aggregates both opened and merged PRs for each period. + - Available for projects connected with Gerrit, GitHub, or GitLab platforms. + - Primary use case: monitoring code review throughput and identifying bottlenecks in the review process. + - Parameters: + - `project`: Required string for project slug (e.g., 'k8s', 'tensorflow') - inherited from `segments_filtered` + - `repos`: Optional array of repository URLs for filtering (e.g., ['https://github.com/kubernetes/kubernetes']) + - `startDate`: Optional DateTime filter for PRs after timestamp (e.g., '2024-01-01 00:00:00') + - `endDate`: Optional DateTime filter for PRs before timestamp (e.g., '2024-12-31 23:59:59') + - `platform`: Optional string filter for source platform (e.g., 'gerrit', 'github', 'gitlab') + - `granularity`: Optional string for time aggregation ('daily', 'weekly', 'monthly', 'quarterly', 'yearly') + - Response: + - Without granularity: `openedCount` and `mergedCount` (single values) + - With granularity: `startDate`, `endDate`, `openedCount`, and `mergedCount` for each time period + +TAGS "" Development metrics", Widget", "Pull requests", "Review efficiency" + +NODE review_efficiency_timeseries +SQL > + % + {% if defined(granularity) %} + SELECT + ds."startDate", + ds."endDate", + ifNull(countIf(isNotNull(prf.id)), 0) AS "openedCount", + ifNull(countIf(isNotNull(prf.id) AND isNotNull(prf.mergedAt)), 0) AS "mergedCount" + FROM generate_timeseries ds + LEFT JOIN + pull_requests_filtered prf + ON CASE + WHEN {{ granularity }} = 'daily' + THEN toDate(prf.openedAt) + WHEN {{ granularity }} = 'weekly' + THEN toStartOfWeek(prf.openedAt) + WHEN {{ granularity }} = 'monthly' + THEN toStartOfMonth(prf.openedAt) + WHEN {{ granularity }} = 'quarterly' + THEN toStartOfQuarter(prf.openedAt) + WHEN {{ granularity }} = 'yearly' + THEN toStartOfYear(prf.openedAt) + END + = ds."startDate" + {% if defined(platform) %} + AND prf.platform + = {{ + String( + platform, + description="Filter by platform (e.g., 'gerrit', 'github', 'gitlab')", + required=False, + ) + }} + {% end %} + {% if defined(startDate) %} + AND prf.openedAt + >= {{ + DateTime( + startDate, + description="Filter PRs opened after this timestamp", + required=False, + ) + }} + {% end %} + {% if defined(endDate) %} + AND prf.openedAt + <= {{ + DateTime( + endDate, + description="Filter PRs opened before this timestamp", + required=False, + ) + }} + {% end %} + GROUP BY ds."startDate", ds."endDate" + ORDER BY ds."startDate" + {% else %} SELECT 1 + {% end %} + +NODE review_efficiency_merged +SQL > + % + {% if not defined(granularity) %} + SELECT count(prf.id) AS "openedCount", countIf(isNotNull(prf.mergedAt)) AS "mergedCount" + FROM pull_requests_filtered prf + WHERE + 1 = 1 + {% if defined(platform) %} + AND prf.platform + = {{ + String( + platform, + description="Filter by platform (e.g., 'gerrit', 'github', 'gitlab')", + required=False, + ) + }} + {% end %} + {% if defined(startDate) %} + AND prf.openedAt + >= {{ + DateTime( + startDate, + description="Filter PRs opened after this timestamp", + required=False, + ) + }} + {% end %} + {% if defined(endDate) %} + AND prf.openedAt + <= {{ + DateTime( + endDate, + description="Filter PRs opened before this timestamp", + required=False, + ) + }} + {% end %} + {% else %} SELECT * FROM review_efficiency_timeseries + {% end %} diff --git a/services/libs/tinybird/pipes/search_collections_projects_repos.pipe b/services/libs/tinybird/pipes/search_collections_projects_repos.pipe index c224aded6d..9e9bab7c8d 100644 --- a/services/libs/tinybird/pipes/search_collections_projects_repos.pipe +++ b/services/libs/tinybird/pipes/search_collections_projects_repos.pipe @@ -10,8 +10,6 @@ DESCRIPTION > - `limit`: Optional integer for result limit per entity type, defaults to 10 - Response: `type` ('collection'|'project'|'repository'), `slug`, `logo`, `projectSlug`, `name` -TAGS "Widget", "Search", "Unified search" - NODE merge_results_from_collections_projects_repos_filtered SQL > % @@ -20,9 +18,7 @@ SQL > collections_filtered.slug, null as logo, null as projectSlug, - collections_filtered.name, - CAST(NULL AS Nullable(UInt8)) as archived, - CAST(NULL AS Nullable(UInt8)) as excluded + collections_filtered.name from collections_filtered order by collections_filtered.projectCount desc limit {{ Integer(limit, 10, description="Limit number of records for each type", required=False) }} @@ -32,9 +28,7 @@ SQL > insightsProjects_filtered.slug, insightsProjects_filtered.logo, insightsProjects_filtered.slug as "projectSlug", - insightsProjects_filtered.name, - CAST(NULL AS Nullable(UInt8)) as archived, - CAST(NULL AS Nullable(UInt8)) as excluded + insightsProjects_filtered.name from insightsProjects_filtered where not ( @@ -49,10 +43,7 @@ SQL > activityRepositories_filtered.repo as slug, null as logo, activityRepositories_filtered.projectSlug as "projectSlug", - null as name, - sr.archived as archived, - sr.excluded as excluded + null as name from activityRepositories_filtered - join segmentRepositories as sr on sr.insightsProjectId = activityRepositories_filtered.projectId order by activityRepositories_filtered.repo asc limit {{ Integer(limit, 10, description="Limit number of records for each type", required=False) }} diff --git a/services/libs/tinybird/pipes/search_volume.pipe b/services/libs/tinybird/pipes/search_volume.pipe index 85c6e25f4b..bb49056ee5 100644 --- a/services/libs/tinybird/pipes/search_volume.pipe +++ b/services/libs/tinybird/pipes/search_volume.pipe @@ -10,8 +10,6 @@ DESCRIPTION > - `endDate`: Optional DateTime filter for search volume data before timestamp (e.g., '2024-12-31 23:59:59') - Response: `insightsProjectId`, `project`, `dataTimestamp` (formatted as YYYY-MM-DD), `volume`, `updatedAt` -TAGS "Widget", "Search analytics", "Time-series" - NODE searchVolume_pipe SQL > % diff --git a/services/libs/tinybird/pipes/security_and_best_practices.pipe b/services/libs/tinybird/pipes/security_and_best_practices.pipe index fd5ec40456..11ded0486c 100644 --- a/services/libs/tinybird/pipes/security_and_best_practices.pipe +++ b/services/libs/tinybird/pipes/security_and_best_practices.pipe @@ -10,8 +10,6 @@ DESCRIPTION > - `repos`: Optional array of repository URLs to filter security assessments (e.g., ['https://github.com/kubernetes/kubernetes']) - Response: `evaluationId`, `category`, `repo`, `controlId`, `message`, `result`, `assessments` -TAGS "Widget", "Security", "Compliance", "Best practices" - NODE evaluation_controlId_category_map SQL > SELECT @@ -97,3 +95,7 @@ SQL > AND s.repo IN {{ Array(repos, 'String', description="Filter activity repo list", required=False) }} {% end %} + +NODE security_and_best_practices_2 +SQL > + SELECT * FROM security_and_best_practices_1 diff --git a/services/libs/tinybird/pipes/segmentId_aggregates_mv.pipe b/services/libs/tinybird/pipes/segmentId_aggregates_snapshot.pipe similarity index 78% rename from services/libs/tinybird/pipes/segmentId_aggregates_mv.pipe rename to services/libs/tinybird/pipes/segmentId_aggregates_snapshot.pipe index 36bac659a9..15ada12677 100644 --- a/services/libs/tinybird/pipes/segmentId_aggregates_mv.pipe +++ b/services/libs/tinybird/pipes/segmentId_aggregates_snapshot.pipe @@ -7,7 +7,7 @@ SQL > segmentId, countDistinctState(memberId) AS contributorCount, countDistinctState(organizationId) AS organizationCount - FROM activityRelations_enrich_clean_snapshot_MV_ds + FROM activityRelations_deduplicated_cleaned_bucket_union WHERE (type, platform) IN ( SELECT activityType, platform @@ -16,5 +16,7 @@ SQL > ) GROUP BY segmentId -TYPE MATERIALIZED -DATASOURCE segmentsAggregatedMV +TYPE COPY +TARGET_DATASOURCE segmentsAggregatedMV +COPY_MODE replace +COPY_SCHEDULE 55 * * * * diff --git a/services/libs/tinybird/pipes/sitemap_categories.pipe b/services/libs/tinybird/pipes/sitemap_categories.pipe new file mode 100644 index 0000000000..ecfb795f31 --- /dev/null +++ b/services/libs/tinybird/pipes/sitemap_categories.pipe @@ -0,0 +1,3 @@ +NODE sitemap_categories_slugs +SQL > + SELECT slug FROM categories FINAL diff --git a/services/libs/tinybird/pipes/sitemap_category_groups.pipe b/services/libs/tinybird/pipes/sitemap_category_groups.pipe new file mode 100644 index 0000000000..875eee15f1 --- /dev/null +++ b/services/libs/tinybird/pipes/sitemap_category_groups.pipe @@ -0,0 +1,3 @@ +NODE sitemap_category_groups_slugs +SQL > + SELECT slug FROM categoryGroups FINAL diff --git a/services/libs/tinybird/pipes/sitemap_collections.pipe b/services/libs/tinybird/pipes/sitemap_collections.pipe new file mode 100644 index 0000000000..f5e0d69a76 --- /dev/null +++ b/services/libs/tinybird/pipes/sitemap_collections.pipe @@ -0,0 +1,3 @@ +NODE sitemap_collections_slugs +SQL > + SELECT slug FROM collections FINAL diff --git a/services/libs/tinybird/pipes/sitemap_projects.pipe b/services/libs/tinybird/pipes/sitemap_projects.pipe new file mode 100644 index 0000000000..24af679be9 --- /dev/null +++ b/services/libs/tinybird/pipes/sitemap_projects.pipe @@ -0,0 +1,3 @@ +NODE sitemap_projects_slugs +SQL > + SELECT slug FROM insightsProjects FINAL diff --git a/services/libs/tinybird/scripts/format_all.sh b/services/libs/tinybird/scripts/format_all.sh index b3fd2a8bde..186ef9ef0a 100755 --- a/services/libs/tinybird/scripts/format_all.sh +++ b/services/libs/tinybird/scripts/format_all.sh @@ -3,14 +3,25 @@ PIPES_FOLDER="../pipes" DATA_SOURCES_FOLDER="../datasources" +# Parse command line arguments +SEQUENTIAL=false +[[ "$1" == "--sequential" ]] && SEQUENTIAL=true + format_files_in_folder() { local folder="$1" for file in "$folder"/*; do - [ -f "$file" ] && tb fmt --yes "$file" & + if [ -f "$file" ]; then + if [ "$SEQUENTIAL" = true ]; then + tb fmt --yes "$file" + else + tb fmt --yes "$file" & + fi + fi done } format_files_in_folder "$PIPES_FOLDER" format_files_in_folder "$DATA_SOURCES_FOLDER" -wait +# Only wait for background processes in parallel mode +[ "$SEQUENTIAL" = false ] && wait