Skip to content

Conversation

@jasonyuezhang
Copy link
Owner

this pr fully replaces the legacy N+1 detectors (DB,MN+1 DB, API) with the updated versions (which are currently 100% released). replaces the legacy code with the updated code and deletes the experiment files.


Copied from getsentry#104199
Original PR: getsentry#104199

Comment on lines +346 to +349
if not common_parent_set:
return None

latest_parent_list = current_parent_list

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[PerformanceOptimization]

O(N*M) complexity issue with parent span lookup: The _find_common_parent_span() method iterates through all spans in the event for each call (line 346-349). If there are 1000+ spans in the event, this becomes expensive. Consider building a span_id -> span lookup map once in __init__:

def __init__(self, ...):
    self.span_lookup = {span.get("span_id"): span for span in event.get("spans", [])}

# Then in _find_common_parent_span:
return self.span_lookup.get(common_parent_span_id)
Context for Agents
**O(N*M) complexity issue with parent span lookup**: The `_find_common_parent_span()` method iterates through all spans in the event for each call (line 346-349). If there are 1000+ spans in the event, this becomes expensive. Consider building a span_id -> span lookup map once in `__init__`:

```python
def __init__(self, ...):
    self.span_lookup = {span.get("span_id"): span for span in event.get("spans", [])}

# Then in _find_common_parent_span:
return self.span_lookup.get(common_parent_span_id)
```

File: src/sentry/issue_detection/detectors/mn_plus_one_db_span_detector.py
Line: 349

) -> None:
self.settings = settings
self.event = event
self.recent_spans = deque(initial_spans or [], self.settings["max_sequence_length"])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

Potential infinite loop with deque maxlen: The recent_spans deque has maxlen=self.settings["max_sequence_length"] (line 75). If max_sequence_length is set to 0 or negative, operations on this deque could behave unexpectedly. Add validation:

max_length = self.settings["max_sequence_length"]
if max_length <= 0:
    raise ValueError(f"max_sequence_length must be positive, got {max_length}")
self.recent_spans = deque(initial_spans or [], max_length)
Context for Agents
**Potential infinite loop with deque maxlen**: The `recent_spans` deque has `maxlen=self.settings["max_sequence_length"]` (line 75). If `max_sequence_length` is set to 0 or negative, operations on this deque could behave unexpectedly. Add validation:

```python
max_length = self.settings["max_sequence_length"]
if max_length <= 0:
    raise ValueError(f"max_sequence_length must be positive, got {max_length}")
self.recent_spans = deque(initial_spans or [], max_length)
```

File: src/sentry/issue_detection/detectors/mn_plus_one_db_span_detector.py
Line: 75

extra={"event_has_meta": event_has_meta},
)
return False
parsed_url = urlparse(str(url))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CriticalError]

Missing Error Handling: If urlparse() throws a ValueError for malformed URLs, the entire detector will crash and fail to process the event. While the old code had try-catch with logging (removed in this change), the new code at line 139 directly calls urlparse() without protection:

parsed_url = urlparse(str(url))  # Can throw ValueError

Specific scenario: URLs with null bytes, invalid schemes, or extremely long URLs can cause urlparse() to fail.

Fix: Re-add error handling:

try:
    parsed_url = urlparse(str(url))
except (ValueError, TypeError) as e:
    logging.exception(
        "N+1 API Calls Detector: URL parsing failed",
        extra={"url": url, "error": str(e)}
    )
    return False
Context for Agents
**Missing Error Handling**: If `urlparse()` throws a `ValueError` for malformed URLs, the entire detector will crash and fail to process the event. While the old code had try-catch with logging (removed in this change), the new code at line 139 directly calls `urlparse()` without protection:

```python
parsed_url = urlparse(str(url))  # Can throw ValueError
```

**Specific scenario**: URLs with null bytes, invalid schemes, or extremely long URLs can cause `urlparse()` to fail.

**Fix**: Re-add error handling:
```python
try:
    parsed_url = urlparse(str(url))
except (ValueError, TypeError) as e:
    logging.exception(
        "N+1 API Calls Detector: URL parsing failed",
        extra={"url": url, "error": str(e)}
    )
    return False
```

File: src/sentry/issue_detection/detectors/n_plus_one_api_calls_detector.py
Line: 139

Comment on lines +295 to +350
if (
span["op"].startswith("db")
and get_span_evidence_value(span, include_op=False) != "prisma:engine:connection"
):
return span
return None

def _find_common_parent_span(self, spans: Sequence[Span]) -> Span | None:
parent_span_id = spans[0].get("parent_span_id")
if not parent_span_id:
return None
for id in [span.get("parent_span_id") for span in spans[1:]]:
if not id or id != parent_span_id:
"""
Using the self.parent_map, identify the common parent within the configured depth
of the every span in the list. Returns None if no common parent is found, or the common
parent is not within the event.
"""
# Use a set to track the common parent across all spans.
# It'll start empty, fill with the first span's parents, and then intersect every span's
# parent list after that.
common_parent_set: set[str] = set()
# We also store the latest parent list for ordering later on.
latest_parent_list: list[str] = []
for span in spans:
span_id = span.get("span_id")
if not span_id:
return None

current_parent_list = []
current_span_id = span_id

# This will run at most `max_allowable_depth` times for n spans.
# For that reason, `max_allowable_depth` cannot be user-configurable -- to avoid
# O(n^2) complexity and load issues.
for _ in range(self.settings["max_allowable_depth"]):
parent_span_id = self.parent_map.get(current_span_id)
if not parent_span_id:
break
current_parent_list.append(parent_span_id)
# If this parent_span_id is already in the global intersection, stop early, we don't
# need to build the rest of the parent list.
if parent_span_id in common_parent_set:
break
current_span_id = parent_span_id

# If common_parent_set is empty (first iteration), set it to the current parent list.
# Otherwise, intersect it with the current_parent_list.
common_parent_set = (
common_parent_set.intersection(set(current_parent_list))
if common_parent_set
else set(current_parent_list)
)

# At this point, if common_parent_set is empty, we can bail out early since that means
# at least two parent lists have no intersection, thus no common parent.
if not common_parent_set:
return None

latest_parent_list = current_parent_list

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[PerformanceOptimization]

N+1 Query Pattern: This code has an N+1 query vulnerability in _find_common_parent_span(). For each offender span (N), the method iterates up to max_allowable_depth levels through self.parent_map, then searches through all spans in the event:

for span in spans:  # N offender spans
    for _ in range(self.settings["max_allowable_depth"]):  # Up to 3 levels
        parent_span_id = self.parent_map.get(current_span_id)
        # ...
    # Then searches all_spans again
    all_spans = self.event.get("spans") or []
    for span in all_spans:  # M spans in event
        if span.get("span_id") == common_parent_span_id:
            return span

With 100 offender spans and 1000 event spans, this becomes ~100k iterations.

Fix: Build a span lookup dictionary once:

def __init__(self, ...):
    # ...
    self.span_lookup = {s.get("span_id"): s for s in self.event.get("spans", [])}

# Then in _find_common_parent_span:
return self.span_lookup.get(common_parent_span_id)
Context for Agents
**N+1 Query Pattern**: This code has an N+1 query vulnerability in `_find_common_parent_span()`. For each offender span (N), the method iterates up to `max_allowable_depth` levels through `self.parent_map`, then searches through all spans in the event:

```python
for span in spans:  # N offender spans
    for _ in range(self.settings["max_allowable_depth"]):  # Up to 3 levels
        parent_span_id = self.parent_map.get(current_span_id)
        # ...
    # Then searches all_spans again
    all_spans = self.event.get("spans") or []
    for span in all_spans:  # M spans in event
        if span.get("span_id") == common_parent_span_id:
            return span
```

With 100 offender spans and 1000 event spans, this becomes ~100k iterations.

**Fix**: Build a span lookup dictionary once:
```python
def __init__(self, ...):
    # ...
    self.span_lookup = {s.get("span_id"): s for s in self.event.get("spans", [])}

# Then in _find_common_parent_span:
return self.span_lookup.get(common_parent_span_id)
```

File: src/sentry/issue_detection/detectors/mn_plus_one_db_span_detector.py
Line: 350

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants