Skip to content

Conversation

@mlim19
Copy link
Contributor

@mlim19 mlim19 commented Nov 7, 2025

Summary

This PR adds automatic retry functionality for the test_dso_name_in_pyperf_profile test, which occasionally fails due to PyPerf timeouts in CI environments.

Problem

The test test_dso_name_in_pyperf_profile[2.7-glibc-python-True-pyperf-True] sporadically fails with:

WARNING  gprofiler.profilers.python_ebpf:python_ebpf.py:301 PyPerf dead/not responding, killing it
TimeoutError

This occurs when PyPerf cannot generate profile files within the 5-second timeout, particularly:

  • In resource-constrained CI environments
  • With Python 2.7 (slower than newer versions)
  • When there's system load or contention

Example failure: https://github.com/intel/gprofiler/actions/runs/19146096101/job/54734480910

Solution

Implement auto-retry using the pytest-rerunfailures plugin:

  1. Add dependency: pytest-rerunfailures==15.0 to dev-requirements.txt
  2. Mark test as flaky: Add @pytest.mark.flaky(reruns=3, reruns_delay=2) decorator

The test will now:

  • Automatically retry up to 3 times if it fails
  • Wait 2 seconds between retry attempts (gives system time to stabilize)
  • Only fail if ALL attempts fail (catches genuine issues)
  • Apply to all 4 parameter combinations of this test

Impact

  • Scope: Only affects test_dso_name_in_pyperf_profile (4 parameter combinations)
  • Other tests: Run normally without retries
  • CI time: Minimal increase (only when test fails)
  • Reliability: Significantly improved CI stability for this flaky test

Changes

+ pytest-rerunfailures==15.0  # Added to dev-requirements.txt

+ @pytest.mark.flaky(reruns=3, reruns_delay=2)  # Added to test function
  @pytest.mark.parametrize("in_container", [True])
  @pytest.mark.parametrize("profiler_type", ["pyperf"])
  def test_dso_name_in_pyperf_profile(...):

Test Plan

  • Test will retry automatically on failure
  • Other tests remain unaffected
  • CI should show "RERUN" status when retry occurs
  • Verify in GitHub Actions that the test passes after retry

🤖 Generated with Claude Code

The test_dso_name_in_pyperf_profile test occasionally fails due to PyPerf
timeouts, particularly with Python 2.7. This is caused by timing issues in
CI environments where PyPerf may not generate profile files within the
5-second timeout.

Changes:
- Add pytest-rerunfailures==15.0 to dev-requirements.txt
- Mark test_dso_name_in_pyperf_profile with @pytest.mark.flaky(reruns=3, reruns_delay=2)

The test will now automatically retry up to 3 times with a 2-second delay
between attempts, which should handle sporadic timeout issues while still
catching genuine failures.

Related issue: Sporadic test failure observed in CI runs
Example failure: https://github.com/intel/gprofiler/actions/runs/19146096101/job/54734480910

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@mlim19 mlim19 merged commit 6ae0dd5 into master Nov 7, 2025
35 checks passed
@mlim19 mlim19 deleted the fix-flaky-pyperf-test-retry branch November 7, 2025 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants