Skip to content

Conversation

@zeroshade
Copy link
Member

Rationale for this change

Experimenting with using AddCleanup for automatic cleanup of the Buffers without requiring Retain and Release everywhere even for custom Allocators.

What changes are included in this PR?

Changes to the underlying Buffer object to leverage AddCleanup when using go1.24 or higher.

Are these changes tested?

Yes, though more tests need to be added to ensure more complex scenarios don't fail.

Are there any user-facing changes?

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the idea is that via AddCleanup we can avoid explicit retain/release calls? I suppose if it makes basic usage of the library simpler, then it makes sense. I think we should caution that for "library" code, it should not be used and retain/release should be used instead.

Is there a runtime cost to attaching the cleanup callback? An application that does intend to use retain/release may not want to pay the price if there is one.

FWIW, Java may be of interest here. Java had community libraries like Netty that effectively relied on explicit retain/release callbacks to avoid GC overhead and access native memory. The modern FFM API offers an API that is instead based around explicit "arenas", and one of the arenas is managed by the garbage collector (= no manual memory management), while others require you to close the arena (but not individual allocations). (Incidentally, arrow-java has "the worst of both worlds": both allocators and individual allocations must be closed.) That may be too restrictive of a paradigm here, though.

@zeroshade
Copy link
Member Author

I suppose if it makes basic usage of the library simpler, then it makes sense. I think we should caution that for "library" code, it should not be used and retain/release should be used instead.

Would make sense to put this into the documentation.

Alternatively, we could use a build tag to control whether it uses addcleanup or not? (turn whichever isn't in use into no-ops)

Is there a runtime cost to attaching the cleanup callback? An application that does intend to use retain/release may not want to pay the price if there is one.

Yes, there's a runtime cost to attaching the cleanup callback (albeit, AFAIK it's very small). In theory, it would be overall cheaper for the AddCleanup solution vs the repeated calls for Retain and Release.

One important caveat to keep in mind (that technically still exists even with the current retain/release paradigm) is that users should never maintain a reference to the []byte from the Bytes() method of memory.Buffer longer than the scope of the buffer (or call to Release). In the case of a non-go allocator, those bytes could get wiped out from under them when the releasing occurs. It may make sense to utilize the new weak.Ptr in some way for how we expose the byte slices from the buffer.

@zeroshade
Copy link
Member Author

@chriscasola Given this came out of your comment on #269 would you be willing to take a look at this and give your thoughts on mine and @lidavidm's comments above?

@chriscasola
Copy link
Contributor

@zeroshade sorry for the super late reply. Your implementation looks good to me. I do question whether we should add even more complexity to the documented API, and tell users that "sometimes" they don't need Retain and Release.

Can we make Retain and Release noops in go 1.25+ and mark them as deprecated?

@pixelherodev
Copy link
Contributor

Is there a runtime cost to attaching the cleanup callback? An application that does intend to use retain/release may not want to pay the price if there is one.

Can we make Retain and Release noops in go 1.25+ and mark them as deprecated?

For what it's worth, Retain and Release are slower in my profiling than the native GC. We have a database using Arrow for part of the API, and patching out the refcounting from arrow-go cut CPU usage significantly.

@pixelherodev
Copy link
Contributor

Is there anything I can do to help get this ready / merged? This is a pretty nice performance win for us, and I'm happy to help with both testing and development work :)

@zeroshade
Copy link
Member Author

I'll go through this again this week and mark it ready for review after I rebase it. This was only an initial run though of this so there'll probably be more refinement necessary.

@lidavidm
Copy link
Member

I'm still concerned about not being able to achieve deterministic memory management, though. I have to worry enough about memory behavior with the current APIs (and there are already some things that are nondeterministic) and I'm worried this will exacerbate it.

@zeroshade
Copy link
Member Author

@lidavidm @pixelherodev what would you both think about a build flag that would make the retain/release calls no-ops and solely use AddCleanup so that a consumer could essentially "opt-in" if they aren't concerned about the non-deterministic nature of the GC?

@lidavidm
Copy link
Member

lidavidm commented Nov 4, 2025

@lidavidm @pixelherodev what would you both think about a build flag that would make the retain/release calls no-ops and solely use AddCleanup so that a consumer could essentially "opt-in" if they aren't concerned about the non-deterministic nature of the GC?

I think that's reasonable, I guess we'd need to duplicate some testing but that's OK. I assume the Go compiler will inline/dead-code-eliminate calls properly so that the overhead doesn't increase.

@pixelherodev
Copy link
Contributor

pixelherodev commented Nov 7, 2025

@lidavidm @pixelherodev what would you both think about a build flag that would make the retain/release calls no-ops and solely use AddCleanup so that a consumer could essentially "opt-in" if they aren't concerned about the non-deterministic nature of the GC?

Having the option to turn off retain/release sounds good to me.

I assume the Go compiler will inline/dead-code-eliminate calls properly so that the overhead doesn't increase.

We've used that internally for similar behavioral switches; IIRC we checked and there's no asm generated when it's disabled.

Edit to clarify: we is referring to my employer, not to Arrow :)

One other thought, though: if we're already using memory.DefaultAllocator, the Go allocator, Free is a no-op. We shouldn't even need the cleanups, or the release/retain. IMO it almost makes more sense to move the AddCleanup logic into memory.Allocator, and not invoke it at all for the Go allocator.

That, combined with Retain/Release being disabled, would allow users of the default allocator to not be paying overhead for special cases - does that design make sense?

if they aren't concerned about the non-deterministic nature of the GC

I'd like to address this a bit more though, too: what are the cases where this is a problem? In my testing, the cost of Retain/Release is often strictly higher than a GC interruption would be anyways, and that's with pretty heavy memory allocations (which I'm still working on cutting down :)

@pixelherodev
Copy link
Contributor

One other thought, though: if we're already using memory.DefaultAllocator, the Go allocator, Free is a no-op. We shouldn't even need the cleanups, or the release/retain. IMO it almost makes more sense to move the AddCleanup logic into memory.Allocator, and not invoke it at all for the Go allocator.

That, combined with Retain/Release being disabled, would allow users of the default allocator to not be paying overhead for special cases - does that design make sense?

Bonus: Cleanups are 16 bytes; this avoids an additional 16 bytes for each Buffer for the Go allocator, which will never use them.

@pixelherodev
Copy link
Contributor

pixelherodev commented Nov 7, 2025

Copying from #269:

The two problems with totally dropping retain/release:

  1. AddCleanup is non-deterministic on when it runs, so for situations where the user is using a custom allocator (instead of the default Go allocator) we'd still want to make sure that the Retain/Release paradigm allows users to control when memory is freed
  2. It's also not guaranteed to run before program exit, which may not be a problem given that typically it's just calling Free on some allocations so maybe this isn't an issue.

2 is a non-issue IMO. The non-determinism of 1 is also, I think, a bit overstated? AddCleanup is allowed to batch cleanups but the documentation says this is only likely to happen with objects that are less than sixteen bytes and that do not contain pointers, neither of which applies to most of the types currently using Retain/Release.

Practically speaking, the cleanups will most likely run within 1 or 2 GC cycles. Are there cases where this is insufficient? How important is it that memory be freed as quickly as possible?

@lidavidm
Copy link
Member

lidavidm commented Nov 7, 2025

I (not speaking from a maintainer perspective but a personal one) have users who are concerned about memory usage, and are not working with Go directly (instead Go components are part of a larger system). I could insert explicit GC calls all over the application, but at that point I'd lean towards dropping Go entirely (other reasons factor into this, though).

This sounds like a win for pure Go users, so I'm in favor of it. I was hoping we could still keep the current guarantees for non-Go users, but it sounds like that will be difficult.

@pixelherodev
Copy link
Contributor

I (not speaking from a maintainer perspective but a personal one)

That's the perspective I was looking for here :)

have users who are concerned about memory usage

In terms of quantity? Maximum heap usage? I'm assuming you're aware of GOGC and GOMEMLIMIT?

I'm wondering if we can find another solution for such users without needing thousands of lines of code like this to support it...

@lidavidm
Copy link
Member

lidavidm commented Nov 7, 2025

Ah, the other issue is that the memory is allocated via things like mallocator, or Go is given handles to external allocations - so the GC isn't necessarily aware of the (full extent of the) memory used. (And with external handles it's nice to be able to say, release this handle right now instead of hoping the GC gets it.)

@pixelherodev
Copy link
Contributor

pixelherodev commented Nov 10, 2025

Ah, the other issue is that the memory is allocated via things like mallocator, or Go is given handles to external allocations - so the GC isn't necessarily aware of the (full extent of the) memory used. (And with external handles it's nice to be able to say, release this handle right now instead of hoping the GC gets it.)

That's mostly just about timing, yeah? Having the memory be freed faster?

If the goal is to bound memory usage of the Go components, whcih aren't as controlled as the others, would it not be sufficient to set a soft memory limit and use the Go allocator instead?

@lidavidm
Copy link
Member

Not permissible with cgo.

@pixelherodev
Copy link
Contributor

Not permissible with cgo.

Ahhhh, gotcha. That's tricky then... is this for Go calling into C, or vice versa? Is there a defined operational timespan? Trying to think if something like an arena allocator would be sufficient, to at least reuse the same memory...

@pixelherodev
Copy link
Contributor

It sounds like the best option that doesn't break anything for anyone, while still getting at least most of the benefits, is to add a build flag to switch from retain/release to AddCleanup, and then to move the cleanup logic into the Allocators, so that anyone using the GoAllocator has no additional runtime overhead for memory management?

We can always try and revisit later and figure out a solution for non-Go use cases..

@pixelherodev
Copy link
Contributor

If it's helpful, I can probably build on top of this PR with those changes?

@zeroshade
Copy link
Member Author

that would be super helpful, thanks! Sorry for not finding the time yet to focus on this

@pixelherodev
Copy link
Contributor

that would be super helpful, thanks! Sorry for not finding the time yet to focus on this

No worries, completely understandable!

@pixelherodev
Copy link
Contributor

pixelherodev commented Nov 21, 2025

It sounds like the best option that doesn't break anything for anyone, while still getting at least most of the benefits, is to add a build flag to switch from retain/release to AddCleanup, and then to move the cleanup logic into the Allocators, so that anyone using the GoAllocator has no additional runtime overhead for memory management?

How to best disable retain/release is another interesting question. We could move every single copy of the Retain/Release implementation into their own files with the build tag, but that's kinda gross IMO, even if we just have it as one per package.

Might be best to have a single implementation of retain/release and refcounting, and then just embed that as a substructure wherever needed - which, actually, is probably a good change to make on its own anyways, so I'll start by sending in a PR with that I think

Edit: ah, except the problem is that each one needs to be able to free all of its refcounted subfields, too, not just to hand it back to the allocator. I'll think this over and se if I can find a way to clean this up...

@pixelherodev
Copy link
Contributor

pixelherodev commented Nov 21, 2025

There's currently 121 different implementations of Release, and 56 of Retain. A lot of them are just this shell:

func (t *Type) Release() {
    debug.Assert(t.refCount.Load() > 0, "too many releases")
    if t.refCount.Add(-1) == 0 {
        if t.someSubfield != nil {
            t.someSubfield.Release()
            t.someSubfield = nil
        }
    }
}

[not to mention that that debug assertion is still doing an extra atomic per release; the Load() can't be optimized out. That said, on x64 at least, it's just

XCHGL AX, AX
MOVQ	16(AX), CX

, hardly a big deal.]

Realistically, these are just wrappers of the subfields. A lot of these are allocated on the Go heap, referencing Allocator memory as a subfield. What we're basically saying is "the subfield's memory is valid for as long as the field itself is."

...except that the field will actually outlive the subfield, so we nil out the reference to the allocated memory, because the field itself is on the Go heap.

These shells could all become

func (t *Type) Release() {
    t.someSubfield.Release()
}

without issue IMO, and the same change made to the Retain. The Retain/Release is really tracking how many additional references to the subfield are present. [The main issue is that on go=>Allocator memory boundaries, we really do want to nil out the pointer immediately to prevent use-after-frees; on go->go boundaries, worst-case is a panic, not memory error.]

What we really have is some allocations that need managed explicitly, and some Go-heap-allocated memory which references it in a dependency tree.

e.g. ipc/MessageReader references ipc/Message, which holds two actual Buffers that need to be freed.

type Allocation struct {
    allocator *mem.Allocator
    data []byte
}

type Refcount struct {
    dependencies []*Refcount
    memories []*Allocation
}

We could then e.g. have Buffer embed an Allocation at the root so that it can be used as one, instead of storing the two pointers directly; then, it'd embed a Refcount which points at that allocation - and any child buffer would have a Refcount with a dependency of the parent.

Then instead of each type implementing the release logic manually, it would set up the dependencies and .Release() would be a single implementation that would walk the dependencies?

@pixelherodev
Copy link
Contributor

Opened a proof-of-concept to add a build flag to disable refcounting: #578

I'm wondering: for those of using just the Go allocator, is AddCleanup even necessary?

If retain/release is intended to be kept as is for other users, should we just have RR by default, and a norc flag for Go users to turn it off?

@zeroshade
Copy link
Member Author

I'm wondering: for those of using just the Go allocator, is AddCleanup even necessary?

In that scenario, it wouldn't be. But the problem is that there's no good way to know whether or not we need the addcleanup without pushing it as a responsibility onto the allocator which would defeat the purpose.

If retain/release is intended to be kept as is for other users, should we just have RR by default, and a norc flag for Go users to turn it off?

At least for our first pass at this, I think that's the right way to go, and we can start introducing using the AddCleanup (behind a flag) behind a potential flag. If it becomes worthwhile, we might look at making the breaking change and a new major version release

@pixelherodev
Copy link
Contributor

At least for our first pass at this, I think that's the right way to go, and we can start introducing using the AddCleanup (behind a flag) behind a potential flag. If it becomes worthwhile, we might look at making the breaking change and a new major version release

In that case, mind reviewing the PoC for adding a flag to opt-out of refcounting? I'd like the approach reviewed before I finish writing the code :)

@zeroshade
Copy link
Member Author

will do, will try to review today

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants