feat: API-Server - Added OTel trace id auto-instrumentation for FastAPI#82
feat: API-Server - Added OTel trace id auto-instrumentation for FastAPI#82morgan-wowk merged 1 commit intomasterfrom
Conversation
567fbc0 to
bfea644
Compare
This stack of pull requests is managed by Graphite. Learn more about stacking. |
bfea644 to
fd231ed
Compare
Ark-kun
left a comment
There was a problem hiding this comment.
Thank you for implementing this.
| ) | ||
|
|
||
| except Exception as e: | ||
| logger.error(f"Failed to configure OpenTelemetry tracing: {e}", exc_info=True) |
There was a problem hiding this comment.
Nit: Can just use logger.exception. It sets exc_info=True automatically.
| from opentelemetry.sdk.trace import TracerProvider | ||
| from opentelemetry.sdk.trace.export import BatchSpanProcessor | ||
|
|
||
| logger = logging.getLogger(__name__) |
There was a problem hiding this comment.
_logger (to make it private)
There was a problem hiding this comment.
Python knowledge +1
| from fastapi import FastAPI | ||
| from opentelemetry import trace | ||
| from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import ( | ||
| OTLPSpanExporter as GRPCSpanExporter, |
There was a problem hiding this comment.
Just curious - Why are we renaming this import?
There was a problem hiding this comment.
It's renamed for clarity because OTel provides two versions of OTLPSpanExporter, one for http and one for grpc. It was not entirely needed previously, but now that I have added http support, it is helpful for distinction.
There was a problem hiding this comment.
Update: Now I am importing modules and not classes directly. This has changed
| from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import ( | ||
| OTLPSpanExporter as GRPCSpanExporter, | ||
| ) | ||
| from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor |
There was a problem hiding this comment.
Style: Let's import modules, not particular classes/functions. https://google.github.io/styleguide/pyguide.html#22-imports
Usage then becomes: module1.Class1 or module1.function1.
Ambiguous module names can be renamed.
import fastapi
#? from opentelemetry.exporter.otlp.proto.grpc import trace_exporter
from opentelemetry import trace as otel_trace
from opentelemetry.sdk import resources as otel_sdk_resources
from opentelemetry.sdk import trace as otel_sdk_trace
from opentelemetry.sdk.trace import export as otel_sdk_trace_export
TBH, I dislike the module naming conventions of opentelemetry SDK. Ambiguity like .trace vs .sdk.trace. I also kind of dislike the module names that look like verbs: trace, export and single nouns like status. These names clash with function names and local variables.
So if the result looks too ugly to you, feel free to skip.
There was a problem hiding this comment.
Understood! Thanks for the style guide. +1 existing style awareness!
| resource = Resource(attributes={SERVICE_NAME: service_name}) | ||
|
|
||
| # Create the OTLP exporter | ||
| otlp_exporter = GRPCSpanExporter( |
There was a problem hiding this comment.
Is GRPCSpanExporter the only exporter that would work good for Tangle API telemetry?
There was a problem hiding this comment.
We can actually support multiple. Like HTTP. Good point.
There was a problem hiding this comment.
I'll change this to be configurable and support multiple exporters
There was a problem hiding this comment.
I do not strictly require to make this configurable.
But just wanted to know whether there is only a single exporter taht makes sense for Tangle or there are more.
There was a problem hiding this comment.
I think it is a great addition to have this configurable. I have added support for both http and grpc
|
|
||
| try: | ||
| # Build service name with environment suffix | ||
| app_env = os.environ.get("APP_ENV", "development") |
There was a problem hiding this comment.
Isn't environement usually ENV?
There was a problem hiding this comment.
That's a tough one. In my Laravel time, they used APP_ENV, but personally I prefer prefixing application variables with TANGLE_.
I've changed this to TANGLE_ENV but am open to going with whatever you prefer. If it's ENV that's totally cool. The reason I like application prefixes is to avoid side effects in other software running on the same server.
| ) | ||
|
|
||
| # Configure OpenTelemetry tracing | ||
| otel_tracing.setup_api_tracing(app) |
There was a problem hiding this comment.
Thank you for making it non-invasive.
|
|
||
| # Instrument the FastAPI application | ||
| # This automatically creates spans for all incoming HTTP requests | ||
| FastAPIInstrumentor.instrument_app(app) |
There was a problem hiding this comment.
What kind of data will this capture and export?
There was a problem hiding this comment.
It is exclusive to API request lifecycle
Here is an example trace: https://docs.google.com/document/d/1h5RrkD2aR77eed6HXGCePHAMivX92HSesVjWeRo17z4/edit?usp=sharing
3a0b981 to
6041ed7
Compare
Volv-G
left a comment
There was a problem hiding this comment.
Given Alexey's comments this looks good to me
6041ed7 to
a12aa86
Compare
|
@Ark-kun Please feel free to give it a final pass when you have a moment. After this, we will be able to take the next step which is metrics 😃 |
|
Note to self: Before merging, try running the collector, and then shut down the collector while Tangle has already sent some successful logs. Let's see what happens when we can no longer report trace data to the collector. |
I have verified that if an OTel collector experiences downtime while Tangle has already started, it will not cause downtime for Tangle. Some logs: |

Add OpenTelemetry tracing support
TL;DR
Adds OpenTelemetry (OTel) distributed tracing capabilities to the API server for improved observability.
What changed?
otel_tracing.pythat configures OpenTelemetry tracing for FastAPI applicationsapi_server_main.pyandstart_local.pypyproject.tomlHow to test?
TANGLE_OTEL_EXPORTER_ENDPOINT: URL for the OTLP collector (e.g., "http://localhost:4317")TANGLE_ENV: Environment name (defaults to "development")TANGLE_OTEL_EXPORTER_PROTOCOL: Protocol to use (defaults to "grpc", can be "http")Why make this change?
Distributed tracing provides deeper insights into request flows through the system, making it easier to:
This implementation uses the OpenTelemetry standard, which is vendor-neutral and widely supported by observability platforms.