feat: Add pipeline run outcome metrics by morgan-wowk · Pull Request #76 · TangleML/tangle

morgan-wowk · 2026-02-02T23:39:09Z

Pipeline Metrics Instrumentation

Tracks pipeline lifecycle from creation to completion with labels for status and user.

Metrics added:

pipeline_runs_total: Counter tracking pipeline runs by status (running/succeeded/failed/cancelled) and created_by
pipeline_run_duration_seconds: Histogram tracking total pipeline duration by final status

These metrics provide visibility into pipeline success rates, completion times, and usage patterns per user.
Durations measure total lifecycle time from creation to terminal state (including queue and execution time).

morgan-wowk · 2026-02-02T23:39:24Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

feat: Add orchestrator error tracking by type #81
feat: Add database performance monitoring #80
feat: Add container performance metrics #79
feat: Add execution outcome and cache metrics #78
feat: Add orchestrator queue health metrics #77
feat: Add pipeline run outcome metrics #76 👈 (View in Graphite)
feat: Add HTTP request metrics instrumentation #75
feat: API-Server - Added OTel trace id auto-instrumentation for FastAPI #82 : 1 other dependent PR (#101 )
master

This stack of pull requests is managed by Graphite. Learn more about stacking.

Tracks pipeline lifecycle from creation to completion with labels for status and user. Metrics added: - pipeline_runs_total: Counter tracking pipeline runs by status (running/succeeded/failed/cancelled) and created_by - pipeline_run_duration_seconds: Histogram tracking total pipeline duration by final status These metrics provide visibility into pipeline success rates, completion times, and usage patterns per user. Durations measure total lifecycle time from creation to terminal state (including queue and execution time).

Ark-kun · 2026-02-06T09:04:22Z

I'm not fully sure we should be doing this on the app side.
Backend can be restarted at any time (e.g. new version is deployed). With current implementation in the PR, this seems to affect the metrics. But I think it shouldn't.

I'm not sure about reporting pipeline run times like this. Tracking execution durations might be more OK.

I'm not sure the backend should be aggregating run's execution status statistics into a single status. This single status will likely be not useful for many of the users. I think it's better when the users can get the full status stats and UI can provide derivatives.

morgan-wowk · 2026-02-06T19:54:47Z

I'm not fully sure we should be doing this on the app side. Backend can be restarted at any time (e.g. new version is deployed). With current implementation in the PR, this seems to affect the metrics. But I think it shouldn't.

I'm not sure about reporting pipeline run times like this. Tracking execution durations might be more OK.

I'm not sure the backend should be aggregating run's execution status statistics into a single status. This single status will likely be not useful for many of the users. I think it's better when the users can get the full status stats and UI can provide derivatives.

Thanks for jumping in to these draft PRs and giving some early feedback. Especially this one. After all our discussion, that does sound like the best thing to do.

Ark-kun · 2026-02-17T22:43:58Z

Overall idea: We want to leverage the telemetry processing systems and dashboards as much as possible to avoid re-inventing the wheel.

Ideal case: We report events via OTel and the dedicated metrics system (which we do not re-invent) creates metrics from those events.
Realistic compromise case: In addition to events, we report some simple metrics that we get anyways (or are cheap to get). E.g. time to process a queued/running execution.
Case we want to avoid (at least for now): Calculating and tracking complex metrics ourselves on the backend side.

morgan-wowk · 2026-02-17T22:52:11Z

Overall idea: We want to leverage the telemetry processing systems and dashboards as much as possible to avoid re-inventing the wheel.

Ideal case: We report events via OTel and the dedicated metrics system (which we do not re-invent) creates metrics from those events. Realistic compromise case: In addition to events, we report some simple metrics that we get anyways (or are cheap to get). E.g. time to process a queued/running execution. Case we want to avoid (at least for now): Calculating and tracking complex metrics ourselves on the backend side.

It sounds great to me! Let's do this

morgan-wowk · 2026-02-17T22:53:25Z

I feel like if we are ever calculating complex / non-cheap metrics ourselves, it's because it's a feature of Tangle, not because we need that metric for reporting. Otherwise, the observation platform should be ingesting simple, non-calculated events

morgan-wowk force-pushed the metrics-http-requests branch from 2a35556 to 7e30835 Compare February 3, 2026 06:40

morgan-wowk force-pushed the metrics-pipeline-outcomes branch from a26f61f to 91a8a84 Compare February 3, 2026 06:40

morgan-wowk force-pushed the metrics-http-requests branch from 7e30835 to 71b35de Compare February 3, 2026 06:44

morgan-wowk force-pushed the metrics-pipeline-outcomes branch from 91a8a84 to 4153fe9 Compare February 3, 2026 06:44

morgan-wowk changed the base branch from metrics-http-requests to graphite-base/76 February 3, 2026 06:46

morgan-wowk mentioned this pull request Feb 3, 2026

feat: API-Server - Added OTel trace id auto-instrumentation for FastAPI #82

Merged

morgan-wowk force-pushed the metrics-pipeline-outcomes branch from 4153fe9 to 2b22982 Compare February 3, 2026 06:55

morgan-wowk force-pushed the graphite-base/76 branch from 71b35de to 4b0797e Compare February 3, 2026 06:55

morgan-wowk closed this Feb 17, 2026

morgan-wowk deleted the metrics-pipeline-outcomes branch February 19, 2026 00:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat: Add pipeline run outcome metrics#76

feat: Add pipeline run outcome metrics#76
morgan-wowk wants to merge 1 commit intographite-base/76from
metrics-pipeline-outcomes

morgan-wowk commented Feb 2, 2026 •

edited

Loading

Uh oh!

morgan-wowk commented Feb 2, 2026 •

edited

Loading

Uh oh!

Ark-kun commented Feb 6, 2026

Uh oh!

morgan-wowk commented Feb 6, 2026

Uh oh!

Ark-kun commented Feb 17, 2026

Uh oh!

morgan-wowk commented Feb 17, 2026

Uh oh!

morgan-wowk commented Feb 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

morgan-wowk commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pipeline Metrics Instrumentation

Uh oh!

morgan-wowk commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ark-kun commented Feb 6, 2026

Uh oh!

morgan-wowk commented Feb 6, 2026

Uh oh!

Ark-kun commented Feb 17, 2026

Uh oh!

morgan-wowk commented Feb 17, 2026

Uh oh!

morgan-wowk commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

morgan-wowk commented Feb 2, 2026 •

edited

Loading

morgan-wowk commented Feb 2, 2026 •

edited

Loading

morgan-wowk commented Feb 17, 2026 •

edited

Loading