The C++ Arena
Arenas are a powerful tool for memory management; evaporating confusion and issues with lifetimes. Though some implementations exist, they're not used extensively in C++, which is unfortunate. The value that arenas provide to C++ programs and their architecture is immense and only continues growing along with the complexity of the program. Following are some ideas of note when implementing an arena that fits well into C++.
Required Reading
To start, you should read Ryan Fleury's excellent article on arenas. He covers in a very in-depth manner problems with traditional manual memory management, what arenas are, and how they solve those problems and simplify lifetimes.
I have to agree with Ryan's critiques of the current state and perception of memory management in C++. RAII1 containers and smart pointers do make memory management more automatic, but they serve to automate entirely the wrong problem. It essentially generates all the glue code that tightly couples lifetimes and memory allocations. There are ways to improve the performance characteristics of this; use a smarter malloc or overload the new/delete operators, but these are complex and do not solve the actual problem with lifetimes.
Using arenas by default simplifies lifetime management immensely: When does an object's lifetime end? When its containing arena kicks it out. This is simple, because it can only happen when popping from or clearing the arena, which also happens when the arena is destroyed.
But enough proselytization.
Basic Design
It's very easy to adapt the fundamental arena interface:
It's workable, but not particularly easy to use. It can be improved by introducing type information about what it is allocating by using templates:
It is also useful to wrap a common pattern for allocating an array. This one chooses not to initialize anything, and notes that in the name. I find this to be a good thing to do, as it makes the user think about the fact the memory is uninitialized.
When designing interfaces like this that use templates only for code ergonomics, I prefer to keep the amount of templatized code as small as possible and leave most of the details to the concrete implementation. This is one technique for preventing code bloat and long compile times from monomorphization.
C++isms
At the language level, C++ has a lot more to say about object lifetimes than C. Lifetimes must be properly "begun", and unexpected things will happen if destructors are not called before lifetimes end.2
Beginning lifetimes
Currently, Place
does not properly handle beginning the lifetime of T
, except in
trivial cases where it is a plain struct. This can be fixed by beginning the lifetime of the object before
returning it by using placement new.
The lifetime is now properly begun and we are free of undefined behavior, but it can now only be used on types
that are constructible with no arguments (default-constructible). The pattern used in STL containers that
support emplace
is to accept arguments of any type and pass them to the constructor.
Place
is essentially the same thing, so we will use the same technique.
Destructors
So far, the arena will work, but it behaves in a very unexpected way when ending the lifetimes of contained objects - the destructors are not called. Generally, container-like constructs in C++ are supposed to call the destructors of the objects they contain, but that's not happening yet.
First we need to be able to store destructors. In C++, it's not possible to take the address of a destructor (or of a constructor for that matter, though both have addresses.) Herb Sutter has a post that covers a nice solution. The gist of it is that we can use a captureless lambda inside a template to capture the type and call the destructor. This compiles directly to the address of the destructor, which is great. It looks like this:
Now whenever placing an object in the arena, we also will create a destructor object for it if it needs one.
PushWithDestructor
automatically fills in the address of the destructor, so it is passed as nullptr
and the type explicitly specified. It also stores the destructor object in the arena's memory. In my arena, this
is implemented by committing pages as needed from the back of the arena block's reservation and filling them
with destructor objects as a downwards-growing stack. In the same way that traditionally the stack and heap
segments of a process grew towards each other, so do the data section in the arena and the bookkeeping section
filled with destructors grow towards each other.
The destructor objects are invoked and removed in the reverse order that objects were added to the arena, just like the built in stack. When popping from the arena, we can simply walk the destructors and check if each pointer is beyond the current end of the data section, and stop once it is not.
Suballocations
One of the best way to simplify lifetime management with arenas is when dealing with extra allocations that an object needs to make. I came across a great real-world example of how using arenas can both simplify code and make it more performant, rather than the common "optimization-complexity" tradeoff.
Serenity LibJS
In a recent
CallExpression
Inside Ladybird's JavaScript interpreter LibJS, it stores various "nodes" corresponding to elements of
JavaScript syntax such as statements, variable declarations, numbers, and function calls. The gist of the
issue is that certain nodes can have different amounts of data in them, even within the same type. We'll focus
on the CallExpression
node, as it was the impetus for the largest change Andreas made when
optimizing memory usage. As the name suggests, CallExpression
represents a function call,
something like my_func(a, b, c)
. Each of the arguments to the call are Expression
s,
and the call is an Expression
itself.
When representing this in a traditional way, it might look something like the following:
And this is indeed what the initial implementation looked like, though it had more features. Andreas realized
that the number of arguments never changed after construction and decided to refactor it so that the arguments
were contained in the same block of memory as the CallExpression
. This would avoid incurring the extra
heap allocation and size overhead of the dynamic array, and involved adding a function to create it:
The class needed to be updated with different members to be able to access the tail array, using the "zero size array" idiom which is common in OS development and C programming:
This isn't standard C++ and relies on vendor extensions implemented by major compilers.
The address of arguments
is equivalent to this + 1
, or directly following
the class's normally-sized members.
The arguments also need to be initialized in the constructor:
It was also necessary to overload the deletion operator to prevent mismatches between malloc and delete and deallocate properly. However, there's no such thing as a virtual deletion operator, so this meant that the base-most class in the AST hierarchy had to use malloc in its creation function as well.
This already looks nearly identical to the implementation of Arena::Place
in logic. Here I think it
is useful to note the purpose of using reference-counted smart pointers everywhere in the tree structure. It
actually has very little to do with lifetime semantics, i.e. allowing code to store a reference to part of the
syntax tree, preventing its deallocation. That functionality is not used. Instead, the purpose of the smart
pointers is to handle allocating and deallocating the tree. But there's already a decent amount of support code
written for that, and it is flawed.
As Ryan said, misuse is still possible, and often not checkable in a language like C++.
And indeed
immediately after this change, Andreas was able to use a memory debugger to find an instance where someone
forgot to call create
and instead used new
directly, resulting in a new/free mismatch.
Nothing prevents future code from mistakenly allocating without using create
. Also,
CallExpression
has a different creation function, so any time it is created, one must remember to
use it instead of the generic ASTNode::create
.
Tail Array
Jonathan Müller shared a tail array
.) As of this writing, Andreas has implemented something
similar, but the full solution could not actually be implemented as-is due to the fact that
CallExpression
actually has a subclass in NewExpression
, which prevents the CRTP
derived-class trick. The PR currently hasn't been merged yet.
Rethinking It
Using arenas, the entire situation can be simplified massively. There is no need for shared pointers, as the
entire syntax tree's lifetime will be consolidated into the same arena that was provided to the parser. Nodes
that need different sizes can simply take an arena parameter and use it to allocate more space for themselves.
In the case of CallExpression
the members can look like this:
And the constructor can be parameterized with an arena:
And all of the supporting create
functions can be removed, along with the custom operator delete.
The constructor delegates the responsibility of allocating memory to its caller.
For some added ease-of-use, a helper can be added to ensure correct usage:
And that's really it! Even if CallExpression
is later derived from, the subclasses can simply also
be parameterized by an arena and forward it to CallExpression
's constructor. The level of
indirection prevents issues with the tail array getting mixed up inside the subclasses. If the class were unable
to be constructed outside of an arena (think private constructor and making Arena
a friend) and
also either marked final
or somehow able to assert that all derived classes are the same size, then
it could use the this + 1
trick to have the tail array inline. This would even work for multiple
tail arrays as the count for each one is known, but it gets more complicated to calculate the pointer offset.
More Sugar
In a codebase written to use arenas as the memory management strategy, types parameterized with arenas are very common. Placing such an object with our current setup looks like this:
It's awkward to need to pass arena
twice in the same function call (recall that the left-hand
instance becomes the this
pointer.) Since the implementation of Place
already has the
arena as a parameter, it can pass it whenever possible.
This will automatically insert the arena as the first parameter if possible, making for seamless parameterization.
As a bonus, it's still possible to call with the explicit arena parameter, since that will modify
Args
and reject the implicit self-insertion.
Closing
That's about it for some of the details and reasoning behind C++ arenas! They are a fantastically useful concept and I hope that you gained enough value from this to try using this memory management strategy in your own C++ projects.
This is the first post on /prog/, if you liked it you can subscribe through
Bye for now, and see you soon!
~029
Footnotes
-
1
Specifically as it relates to memory management. "Scope Bound Resource Management" is okay.
But again can be improved by arenas. - 2 It is actually legal to end a lifetime without calling the appropriate destructor, as long as it is because another object is being placed there. This does not apply to what we are doing.