Skip to content

Retry after review 1 draft#3

Open
vaidas-shopify wants to merge 15 commits intomasterfrom
retry-after-review-1-draft
Open

Retry after review 1 draft#3
vaidas-shopify wants to merge 15 commits intomasterfrom
retry-after-review-1-draft

Conversation

@vaidas-shopify
Copy link
Owner

Thanks for taking the time to contribute to Git! Please be advised that the
Git community does not use github.com for their contributions. Instead, we use
a mailing list (git@vger.kernel.org) for code submissions, code reviews, and
bug reports. Nevertheless, you can use GitGitGadget (https://gitgitgadget.github.io/)
to conveniently send your Pull Requests commits to our mailing list.

For a single-commit pull request, please leave the pull request description
empty
: your commit message itself should describe your changes.

Please read the "guidelines for contributing" linked above!

Add retry logic for HTTP 429 (Too Many Requests) responses to handle
server-side rate limiting gracefully. When Git's HTTP client receives
a 429 response, it can now automatically retry the request after an
appropriate delay, respecting the server's rate limits.

The implementation supports the RFC-compliant Retry-After header in
both delay-seconds (integer) and HTTP-date (RFC 2822) formats. If a
past date is provided, Git retries immediately without waiting.

Retry behavior is controlled by three new configuration options:

  * http.maxRetries: Maximum number of retry attempts (default: 0,
    meaning retries are disabled by default). Users must explicitly
    opt-in to retry behavior.

  * http.retryAfter: Default delay in seconds when the server doesn't
    provide a Retry-After header (default: -1, meaning fail if no
    header is provided). This serves as a fallback mechanism.

  * http.maxRetryTime: Maximum delay in seconds for a single retry
    (default: 300). If the server requests a delay exceeding this
    limit, Git fails immediately rather than waiting. This prevents
    indefinite blocking on unreasonable server requests.

All three options can be overridden via environment variables:
GIT_HTTP_MAX_RETRIES, GIT_HTTP_RETRY_AFTER, and
GIT_HTTP_MAX_RETRY_TIME.

The retry logic implements a fail-fast approach: if any delay
(whether from server header or configuration) exceeds maxRetryTime,
Git fails immediately with a clear error message rather than capping
the delay. This provides better visibility into rate limiting issues.

The implementation includes extensive test coverage for basic retry
behavior, Retry-After header formats (integer and HTTP-date),
configuration combinations, maxRetryTime limits, invalid header
handling, environment variable overrides, and edge cases.

Signed-off-by: Vaidas Pilkauskas <vaidas.pilkauskas@shopify.com>
Fix a memory leak in show_http_message() that was triggered when
displaying HTTP error messages before die(). The function would call
strbuf_reencode() which modifies the caller's strbuf in place,
allocating new memory for the re-encoded string. Since this function
is only called immediately before die(), the allocated memory was
never explicitly freed, causing leak detectors to report it.

The leak became visible when HTTP 429 rate limit retry support was
added, which introduced the HTTP_RATE_LIMITED error case. However,
the issue existed in pre-existing error paths as well
(HTTP_MISSING_TARGET, HTTP_NOAUTH, HTTP_NOMATCHPUBLICKEY) - the new
retry logic just made it more visible in tests because retries
exercise the error paths more frequently.

The leak was detected by LeakSanitizer in t5584 tests that enable
retries (maxRetries > 0). Tests with retries disabled passed because
they took a different code path or timing.

Fix this by making show_http_message() work on a local copy of the
message buffer instead of modifying the caller's buffer in place:

1. Create a local strbuf and copy the message into it
2. Perform re-encoding on the local copy if needed
3. Display the message from the local copy
4. Properly release the local copy before returning

This ensures all memory allocated by strbuf_reencode() is freed
before the function returns, even though die() is called immediately
after, eliminating the leak.

Signed-off-by: Vaidas Pilkauskas <vaidas.pilkauskas@shopify.com>
Add trace2 instrumentation to HTTP 429 retry operations to enable
monitoring and debugging of rate limit scenarios in production
environments.

The trace2 logging captures:

  * Retry attempt numbers (http/429-retry-attempt) to track retry
    progression and identify how many attempts were needed

  * Retry-After header values (http/429-retry-after) from server
    responses to understand server-requested delays

  * Actual sleep durations (http/retry-sleep-seconds) within trace2
    regions (http/retry-sleep) to measure time spent waiting

  * Error conditions (http/429-error) such as "retries-exhausted",
    "exceeds-max-retry-time", "no-retry-after-config", and
    "config-exceeds-max-retry-time" for diagnosing failures

  * Retry source (http/429-retry-source) indicating whether delay
    came from server header or config default

This instrumentation provides complete visibility into retry behavior,
enabling operators to monitor rate limiting patterns, diagnose retry
failures, and optimize retry configuration based on real-world data.

Signed-off-by: Vaidas Pilkauskas <vaidas.pilkauskas@shopify.com>
@vaidas-shopify vaidas-shopify force-pushed the retry-after-review-1-draft branch from 7803c38 to ff0dd0a Compare December 15, 2025 11:46
@vaidas-shopify vaidas-shopify force-pushed the retry-after-review-1-draft branch from ff0dd0a to 869d97c Compare December 15, 2025 11:51
@vaidas-shopify vaidas-shopify force-pushed the retry-after-review-1-draft branch from 02046d7 to df8e052 Compare December 16, 2025 11:46
@vaidas-shopify vaidas-shopify force-pushed the retry-after-review-1-draft branch from df8e052 to 36bad7a Compare December 16, 2025 13:33
vaidas-shopify pushed a commit that referenced this pull request Dec 18, 2025
When pushing to a set of remotes using a nickname for the group, the
client initializes the connection to each remote, talks to the
remote and reads and parses capabilities line, and holds the
capabilities in a file-scope static variable server_capabilities_v1.

There are a few other such file-scope static variables, and these
connections cannot be parallelized until they are refactored to a
structure that keeps track of active connections.

Which is *not* the theme of this patch ;-)

For a single connection, the server_capabilities_v1 variable is
initialized to NULL (at the program initialization), populated when
we talk to the other side, used to look up capabilities of the other
side possibly multiple times, and the memory is held by the variable
until program exit, without leaking.  When talking to multiple remotes,
however, the server capabilities from the second connection overwrites
without freeing the one from the first connection, which leaks.

    ==1080970==ERROR: LeakSanitizer: detected memory leaks

    Direct leak of 421 byte(s) in 2 object(s) allocated from:
	#0 0x5615305f849e in strdup (/home/gitster/g/git-jch/bin/bin/git+0x2b349e) (BuildId: 54d149994c9e85374831958f694bd0aa3b8b1e26)
	#1 0x561530e76cc4 in xstrdup /home/gitster/w/build/wrapper.c:43:14
	#2 0x5615309cd7fa in process_capabilities /home/gitster/w/build/connect.c:243:27
	#3 0x5615309cd502 in get_remote_heads /home/gitster/w/build/connect.c:366:4
	#4 0x561530e2cb0b in handshake /home/gitster/w/build/transport.c:372:3
	#5 0x561530e29ed7 in get_refs_via_connect /home/gitster/w/build/transport.c:398:9
	git#6 0x561530e26464 in transport_push /home/gitster/w/build/transport.c:1421:16
	#7 0x561530800bec in push_with_options /home/gitster/w/build/builtin/push.c:387:8
	git#8 0x5615307ffb99 in do_push /home/gitster/w/build/builtin/push.c:442:7
	git#9 0x5615307fe926 in cmd_push /home/gitster/w/build/builtin/push.c:664:7
	git#10 0x56153065673f in run_builtin /home/gitster/w/build/git.c:506:11
	git#11 0x56153065342f in handle_builtin /home/gitster/w/build/git.c:779:9
	git#12 0x561530655b89 in run_argv /home/gitster/w/build/git.c:862:4
	git#13 0x561530652cba in cmd_main /home/gitster/w/build/git.c:984:19
	git#14 0x5615308dda0a in main /home/gitster/w/build/common-main.c:9:11
	git#15 0x7f051651bca7 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16

    SUMMARY: AddressSanitizer: 421 byte(s) leaked in 2 allocation(s).

Free the capablities data for the previous server before overwriting
it with the next server to plug this leak.

The added test fails without the freeing with SANITIZE=leak; I
somehow couldn't get it fail reliably with SANITIZE=leak,address
though.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
vaidas-shopify pushed a commit that referenced this pull request Feb 11, 2026
When performing auto-maintenance we check whether commit graphs need to
be generated by counting the number of commits that are reachable by any
reference, but not covered by a commit graph. This search is performed
by iterating through all references and then doing a depth-first search
until we have found enough commits that are not present in the commit
graph.

This logic has a memory leak though:

  Direct leak of 16 byte(s) in 1 object(s) allocated from:
      #0 0x55555562e433 in malloc (git+0xda433)
      #1 0x555555964322 in do_xmalloc ../wrapper.c:55:8
      #2 0x5555559642e6 in xmalloc ../wrapper.c:76:9
      #3 0x55555579bf29 in commit_list_append ../commit.c:1872:35
      #4 0x55555569f160 in dfs_on_ref ../builtin/gc.c:1165:4
      #5 0x5555558c33fd in do_for_each_ref_iterator ../refs/iterator.c:431:12
      git#6 0x5555558af520 in do_for_each_ref ../refs.c:1828:9
      #7 0x5555558ac317 in refs_for_each_ref ../refs.c:1833:9
      git#8 0x55555569e207 in should_write_commit_graph ../builtin/gc.c:1188:11
      git#9 0x55555569c915 in maintenance_is_needed ../builtin/gc.c:3492:8
      git#10 0x55555569b76a in cmd_maintenance ../builtin/gc.c:3542:9
      git#11 0x55555575166a in run_builtin ../git.c:506:11
      git#12 0x5555557502f0 in handle_builtin ../git.c:779:9
      git#13 0x555555751127 in run_argv ../git.c:862:4
      git#14 0x55555575007b in cmd_main ../git.c:984:19
      git#15 0x5555557523aa in main ../common-main.c:9:11
      git#16 0x7ffff7a2a4d7 in __libc_start_call_main (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x2a4d7) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
      git#17 0x7ffff7a2a59a in __libc_start_main@GLIBC_2.2.5 (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x2a59a) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
      git#18 0x5555555f0934 in _start (git+0x9c934)

The root cause of this memory leak is our use of `commit_list_append()`.
This function expects as parameters the item to append and the _tail_ of
the list to append. This tail will then be overwritten with the new tail
of the list so that it can be used in subsequent calls. But we call it
with `commit_list_append(parent->item, &stack)`, so we end up losing
everything but the new item.

This issue only surfaces when counting merge commits. Next to being a
memory leak, it also shows that we're in fact miscounting as we only
respect children of the last parent. All previous parents are discarded,
so their children will be disregarded unless they are hit via another
reference.

While crafting a test case for the issue I was puzzled that I couldn't
establish the proper border at which the auto-condition would be
fulfilled. As it turns out, there's another bug: if an object is at the
tip of any reference we don't mark it as seen. Consequently, if it is
the tip of or reachable via another ref, we'd count that object multiple
times.

Fix both of these bugs so that we properly count objects without leaking
any memory.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
vaidas-shopify pushed a commit that referenced this pull request Feb 11, 2026
It is possible to hit a memory leak when reading data from a submodule
via git-grep(1):

  Direct leak of 192 byte(s) in 1 object(s) allocated from:
    #0 0x55555562e726 in calloc (git+0xda726)
    #1 0x555555964734 in xcalloc ../wrapper.c:154:8
    #2 0x555555835136 in load_multi_pack_index_one ../midx.c:135:2
    #3 0x555555834fd6 in load_multi_pack_index ../midx.c:382:6
    #4 0x5555558365b6 in prepare_multi_pack_index_one ../midx.c:716:17
    #5 0x55555586c605 in packfile_store_prepare ../packfile.c:1103:3
    git#6 0x55555586c90c in packfile_store_reprepare ../packfile.c:1118:2
    #7 0x5555558546b3 in odb_reprepare ../odb.c:1106:2
    git#8 0x5555558539e4 in do_oid_object_info_extended ../odb.c:715:4
    git#9 0x5555558533d1 in odb_read_object_info_extended ../odb.c:862:8
    git#10 0x5555558540bd in odb_read_object ../odb.c:920:6
    git#11 0x55555580a330 in grep_source_load_oid ../grep.c:1934:12
    git#12 0x55555580a13a in grep_source_load ../grep.c:1986:10
    git#13 0x555555809103 in grep_source_is_binary ../grep.c:2014:7
    git#14 0x555555807574 in grep_source_1 ../grep.c:1625:8
    git#15 0x555555807322 in grep_source ../grep.c:1837:10
    git#16 0x5555556a5c58 in run ../builtin/grep.c:208:10
    git#17 0x55555562bb42 in void* ThreadStartFunc<false>(void*) lsan_interceptors.cpp.o
    git#18 0x7ffff7a9a979 in start_thread (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x9a979) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
    git#19 0x7ffff7b22d2b in __GI___clone3 (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x122d2b) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)

The root caues of this leak is the way we set up and release the
submodule:

  1. We use `repo_submodule_init()` to initialize a new repository. This
     repository is stored in `repos_to_free`.

  2. We now read data from the submodule repository.

  3. We then call `repo_clear()` on the submodule repositories.

  4. `repo_clear()` calls `odb_free()`.

  5. `odb_free()` calls `odb_free_sources()` followed by `odb_close()`.

The issue here is the 5th step: we call `odb_free_sources()` _before_ we
call `odb_close()`. But `odb_free_sources()` already frees all sources,
so the logic that closes them in `odb_close()` now becomes a no-op. As a
consequence, we never explicitly close sources at all.

Fix the leak by closing the store before we free the sources.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant