Skip to content

Comments

feat: API-Server - Added OTel trace id auto-instrumentation for FastAPI#82

Merged
morgan-wowk merged 1 commit intomasterfrom
otel-api-tracing
Feb 20, 2026
Merged

feat: API-Server - Added OTel trace id auto-instrumentation for FastAPI#82
morgan-wowk merged 1 commit intomasterfrom
otel-api-tracing

Conversation

@morgan-wowk
Copy link
Collaborator

@morgan-wowk morgan-wowk commented Feb 3, 2026

Add OpenTelemetry tracing support

TL;DR

Adds OpenTelemetry (OTel) distributed tracing capabilities to the API server for improved observability.

What changed?

  • Created a new module otel_tracing.py that configures OpenTelemetry tracing for FastAPI applications
  • Added OTel tracing initialization to both api_server_main.py and start_local.py
  • Added required OpenTelemetry dependencies to pyproject.toml
  • The implementation supports both gRPC and HTTP protocols for exporting traces

How to test?

  1. Run an OTel collector locally (see https://hub.docker.com/r/otel/opentelemetry-collector)
  2. Set the following environment variables:
    • TANGLE_OTEL_EXPORTER_ENDPOINT: URL for the OTLP collector (e.g., "http://localhost:4317")
    • TANGLE_ENV: Environment name (defaults to "development")
    • TANGLE_OTEL_EXPORTER_PROTOCOL: Protocol to use (defaults to "grpc", can be "http")
  3. Start the application
  4. Make API requests and verify traces are being sent to your collector

Why make this change?

Distributed tracing provides deeper insights into request flows through the system, making it easier to:

  • Debug performance issues
  • Understand service dependencies
  • Monitor request latency across components
  • Identify bottlenecks in the application

This implementation uses the OpenTelemetry standard, which is vendor-neutral and widely supported by observability platforms.

Copy link
Collaborator Author

morgan-wowk commented Feb 3, 2026

This stack of pull requests is managed by Graphite. Learn more about stacking.

Copy link
Contributor

@Ark-kun Ark-kun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for implementing this.

)

except Exception as e:
logger.error(f"Failed to configure OpenTelemetry tracing: {e}", exc_info=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Can just use logger.exception. It sets exc_info=True automatically.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Changed

from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor

logger = logging.getLogger(__name__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_logger (to make it private)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python knowledge +1

from fastapi import FastAPI
from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import (
OTLPSpanExporter as GRPCSpanExporter,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious - Why are we renaming this import?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's renamed for clarity because OTel provides two versions of OTLPSpanExporter, one for http and one for grpc. It was not entirely needed previously, but now that I have added http support, it is helpful for distinction.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: Now I am importing modules and not classes directly. This has changed

from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import (
OTLPSpanExporter as GRPCSpanExporter,
)
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: Let's import modules, not particular classes/functions. https://google.github.io/styleguide/pyguide.html#22-imports
Usage then becomes: module1.Class1 or module1.function1.
Ambiguous module names can be renamed.

import fastapi
#? from opentelemetry.exporter.otlp.proto.grpc import trace_exporter
from opentelemetry import trace as otel_trace
from opentelemetry.sdk import resources as otel_sdk_resources
from opentelemetry.sdk import trace as otel_sdk_trace
from opentelemetry.sdk.trace import export as otel_sdk_trace_export

TBH, I dislike the module naming conventions of opentelemetry SDK. Ambiguity like .trace vs .sdk.trace. I also kind of dislike the module names that look like verbs: trace, export and single nouns like status. These names clash with function names and local variables.

So if the result looks too ugly to you, feel free to skip.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood! Thanks for the style guide. +1 existing style awareness!

resource = Resource(attributes={SERVICE_NAME: service_name})

# Create the OTLP exporter
otlp_exporter = GRPCSpanExporter(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is GRPCSpanExporter the only exporter that would work good for Tangle API telemetry?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can actually support multiple. Like HTTP. Good point.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change this to be configurable and support multiple exporters

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not strictly require to make this configurable.
But just wanted to know whether there is only a single exporter taht makes sense for Tangle or there are more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a great addition to have this configurable. I have added support for both http and grpc


try:
# Build service name with environment suffix
app_env = os.environ.get("APP_ENV", "development")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't environement usually ENV?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a tough one. In my Laravel time, they used APP_ENV, but personally I prefer prefixing application variables with TANGLE_.

I've changed this to TANGLE_ENV but am open to going with whatever you prefer. If it's ENV that's totally cool. The reason I like application prefixes is to avoid side effects in other software running on the same server.

)

# Configure OpenTelemetry tracing
otel_tracing.setup_api_tracing(app)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making it non-invasive.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😀


# Instrument the FastAPI application
# This automatically creates spans for all incoming HTTP requests
FastAPIInstrumentor.instrument_app(app)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of data will this capture and export?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is exclusive to API request lifecycle

Here is an example trace: https://docs.google.com/document/d/1h5RrkD2aR77eed6HXGCePHAMivX92HSesVjWeRo17z4/edit?usp=sharing

Copy link

@Volv-G Volv-G left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given Alexey's comments this looks good to me

Copy link
Collaborator Author

@Ark-kun Please feel free to give it a final pass when you have a moment.

After this, we will be able to take the next step which is metrics 😃

@morgan-wowk
Copy link
Collaborator Author

morgan-wowk commented Feb 20, 2026

Note to self:

Before merging, try running the collector, and then shut down the collector while Tangle has already sent some successful logs. Let's see what happens when we can no longer report trace data to the collector.

@morgan-wowk
Copy link
Collaborator Author

morgan-wowk commented Feb 20, 2026

Note to self:

Before merging, try running the collector, and then shut down the collector while Tangle has already sent some successful logs. Let's see what happens when we can no longer report trace data to the collector.

I have verified that if an OTel collector experiences downtime while Tangle has already started, it will not cause downtime for Tangle.

Some logs:

2026-02-19 17:50:28,547 [WARNING] opentelemetry.exporter.otlp.proto.grpc.exporter: Transient error StatusCode.UNAVAILABLE encountered while exporting traces to localhost:4317, retrying in 0.81s.
2026-02-19 17:50:35,099 [ERROR] opentelemetry.exporter.otlp.proto.grpc.exporter: Failed to export traces to localhost:4317, error code: StatusCode.UNAVAILABLE
2026-02-19 17:50:41,868 [INFO] uvicorn.access: 127.0.0.1:0 - "GET /services/ping HTTP/1.0" 200
2026-02-19 17:50:41,893 [INFO] uvicorn.access: 127.0.0.1:0 - "GET /api/pipeline_runs/?include_pipeline_names=true&include_execution_stats=true HTTP/1.0" 200
2026-02-19 17:50:45,110 [WARNING] opentelemetry.exporter.otlp.proto.grpc.exporter: Transient error StatusCode.UNAVAILABLE encountered while exporting traces to localhost:4317, retrying in 0.92s.
2026-02-19 17:50:51,908 [ERROR] opentelemetry.exporter.otlp.proto.grpc.exporter: Failed to export traces to localhost:4317, error code: StatusCode.UNAVAILABLE

@morgan-wowk morgan-wowk merged commit f7913db into master Feb 20, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants