Skip to content

Comments

feat: Add pipeline run outcome metrics#76

Closed
morgan-wowk wants to merge 1 commit intographite-base/76from
metrics-pipeline-outcomes
Closed

feat: Add pipeline run outcome metrics#76
morgan-wowk wants to merge 1 commit intographite-base/76from
metrics-pipeline-outcomes

Conversation

@morgan-wowk
Copy link
Collaborator

@morgan-wowk morgan-wowk commented Feb 2, 2026

Pipeline Metrics Instrumentation

Tracks pipeline lifecycle from creation to completion with labels for status and user.

Metrics added:

  • pipeline_runs_total: Counter tracking pipeline runs by status (running/succeeded/failed/cancelled) and created_by
  • pipeline_run_duration_seconds: Histogram tracking total pipeline duration by final status

These metrics provide visibility into pipeline success rates, completion times, and usage patterns per user.
Durations measure total lifecycle time from creation to terminal state (including queue and execution time).

Copy link
Collaborator Author

morgan-wowk commented Feb 2, 2026

Tracks pipeline lifecycle from creation to completion with labels for status and user.

Metrics added:
- pipeline_runs_total: Counter tracking pipeline runs by status (running/succeeded/failed/cancelled) and created_by
- pipeline_run_duration_seconds: Histogram tracking total pipeline duration by final status

These metrics provide visibility into pipeline success rates, completion times, and usage patterns per user.
Durations measure total lifecycle time from creation to terminal state (including queue and execution time).
@Ark-kun
Copy link
Contributor

Ark-kun commented Feb 6, 2026

I'm not fully sure we should be doing this on the app side.
Backend can be restarted at any time (e.g. new version is deployed). With current implementation in the PR, this seems to affect the metrics. But I think it shouldn't.

I'm not sure about reporting pipeline run times like this. Tracking execution durations might be more OK.

I'm not sure the backend should be aggregating run's execution status statistics into a single status. This single status will likely be not useful for many of the users. I think it's better when the users can get the full status stats and UI can provide derivatives.

@morgan-wowk
Copy link
Collaborator Author

I'm not fully sure we should be doing this on the app side. Backend can be restarted at any time (e.g. new version is deployed). With current implementation in the PR, this seems to affect the metrics. But I think it shouldn't.

I'm not sure about reporting pipeline run times like this. Tracking execution durations might be more OK.

I'm not sure the backend should be aggregating run's execution status statistics into a single status. This single status will likely be not useful for many of the users. I think it's better when the users can get the full status stats and UI can provide derivatives.

Thanks for jumping in to these draft PRs and giving some early feedback. Especially this one. After all our discussion, that does sound like the best thing to do.

@Ark-kun
Copy link
Contributor

Ark-kun commented Feb 17, 2026

Overall idea: We want to leverage the telemetry processing systems and dashboards as much as possible to avoid re-inventing the wheel.

Ideal case: We report events via OTel and the dedicated metrics system (which we do not re-invent) creates metrics from those events.
Realistic compromise case: In addition to events, we report some simple metrics that we get anyways (or are cheap to get). E.g. time to process a queued/running execution.
Case we want to avoid (at least for now): Calculating and tracking complex metrics ourselves on the backend side.

@morgan-wowk
Copy link
Collaborator Author

Overall idea: We want to leverage the telemetry processing systems and dashboards as much as possible to avoid re-inventing the wheel.

Ideal case: We report events via OTel and the dedicated metrics system (which we do not re-invent) creates metrics from those events. Realistic compromise case: In addition to events, we report some simple metrics that we get anyways (or are cheap to get). E.g. time to process a queued/running execution. Case we want to avoid (at least for now): Calculating and tracking complex metrics ourselves on the backend side.

It sounds great to me! Let's do this

@morgan-wowk
Copy link
Collaborator Author

morgan-wowk commented Feb 17, 2026

I feel like if we are ever calculating complex / non-cheap metrics ourselves, it's because it's a feature of Tangle, not because we need that metric for reporting. Otherwise, the observation platform should be ingesting simple, non-calculated events

@morgan-wowk morgan-wowk deleted the metrics-pipeline-outcomes branch February 19, 2026 00:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants