If carcinization happens when languages evolve to be more like Rust, then what do you call it when Rust evolves to be more like Java? Caffeination?
Over this summer, I’ve had a decent amount of time to kill. What better way to spend a beautiful hot summer than sitting inside, staring at core dumps in a supposedly memory-safe language? I built a garbage collector - in short, a piece of software that manages allocations for another, more exciting piece of software. The cool part: I made it in Rust, for Rust - a language designed to eliminate garbage collection and which provides few facilities for making it work properly. Not only that, I managed to make my garbage collector work with relatively few compromises, which I’ll describe in further detail lower down.
If you don’t like reading, don’t care about the journey to implement it, or just want to get a sales pitch for the final result, check out the source code on GitHub here. You can also download it directly from crates.io.
Background
If you’re familiar with the details of Rust and its standard library, feel free to skip this section.
The core backing behind Rust’s memory model is affine typing and the borrow checker. Values may only be bound to one identifier at a time, and borrows (a.k.a. references) may not outlive the scope binding their referent.
For example, the following code is invalid:
let x = vec!;
let y = x;
println!; // compile error - x has already been moved
Normally, we work around this by borrowing against a binding, such as by
making y = &x
in the example above. However, we often need to share
some heap-allocated value without knowing which binding will live the
longest. The solution to this problem is shared ownership via garbage
collection.
Rust’s standard library offers two simple reference-counted garbage
collectors: the single-threaded
Rc
and its atomically-indexed
counterpart Arc
. They
operate by maintaining a reference count in each heap allocation. Under
most circumstances, these work great, but they can’t handle cyclic
references. Combined with interior mutability, it’s trivial to refute
them.
use ;
;
let x = new;
x.0.set;
// My foo has a reference to itself. It can never be freed!
This is why people actually get paid money to build garbage collectors. If using a reference counter were all you needed, a number of people working at Oracle would be out of a job.
Battle plan
We’d like to create some Gc
data structure with a similar API to Rc
and Arc
, which can accept nearly any data type contained within, and
still manage to detect and collect cycles.
We have a few weapons at our disposal:
Drop
: Every non-copiable data type in Rust can implement theDrop
trait to ensure some code is called every time it is dropped. In our case, we can implementDrop
forGc
to try to glean some knowledge about when an allocation becomes inaccessible.- Traits: We can construct some trait (in our case, let’s call it
Collectable
) as a mandatory requirement to be contained in aGc
. Creating this trait has some major downsides (libraries upstream ofdumpster
can’t implement it) but it’s a necessary evil.
However, our formidable tools are matched by equally formidable challenges:
- Undetectable moves. When a value is moved in Rust, there is no
way to detect that fact from within that value. If we had some sort
of trait like
OnMove
which could allow for a function to be called every time it had moved, we could use it to detect when aGc
moved inside anotherGc
, making it unrooted, and allowing us to create a simple mark-sweep collector. At this rate, though, we would just reinvent C++’s copy constructors. - Variable-sized data. I foolishly decided to make it so that
Gc
could store?Sized
types, which is more flexible for library users (enabling things likeGc<[T]>
). - Type erasure. In a typical Rust program, generics are implemented via monomorphization, and no type information is retained at runtime. This makes it harder to clean up an allocation without prior context.
With these two, relatively simple tools, we have enough to build a collector which can handle just about any data type inside.
My approach
We need to detect whether a Gc
is reachable without actually scanning
the stack. I’ll start with a few definitions, using graph-theoretical
language.
- An allocation graph
is a directed graph with a root node
whose indegree is zero.
- A node
in an allocation graph is said to be accessible if and only if there exists a path from
to
in the graph.
It should be clear to see why these definitions are useful to us. We can
imagine each node being one allocation, pretending that all data not
indirected through a Gc
is part of an imaginary allocation for the
root. Additionally, each Gc
acts as an edge connecting two
allocations. If an allocation is accessible in the graph-theoretical
sense, it’s still possible for a program to reach it, and if not, it’s
safe to free that allocation. Note that the indegree of a node is
precisely equal to the reference count of its corresponding allocation.
I’ll roughly outline our approach to determining whether some node
is accessible.
- Find
, the set of all descendants of
. Construct the subgraph
of
whose vertex set is
, preserving all edges connecting nodes in
.
- Find the set of nodes
for which the indegree of each
in
is not equal to its indegree in
.
- For each
, mark it and its descendants as accessible.
- If and only if
was marked as accessible, it is accessible.
Here’s a serpentine implementation in pseudocode, if that’s what you prefer.
=
=
return in
=
-= 1
It should be pretty clear to see that is_accessible
runs in
time.
Reference counting with extra steps
Our single-threaded Gc
will behave much like an Rc
, but with some
minor details changed.
First, we define a Gc
and the allocation it points to, a GcBox
.
We can then hook into the Drop
behavior for our Gc
to make it all
work.
I, being very clever, decided to use a sentinel value of 0 for the reference count when an allocation is being cleaned up (to prevent spurious double-frees).
However, this code has a problem - how the hell are we supposed to
implement is_accessible
?
Trait hackery
Our pseudocode for is_accessible
required us to be able to access the
set of edges going out from a node. Doing so is kind of hard. If we were
writing this in C, we would scan the allocation on the heap, looking for
data that looked like they could be pointers into other allocations. In
Rust, though, we can be a lot more precise by adding a small constraint
to every garbage-collected type.
If the garbage collector wasn’t enough, we’re bringing the visitor
pattern over from Java. We force every garbage-collected value to
implement the Collectable
trait, which will delegate some Visitor
to
each of its garbage-collected fields.
For example, an array might implement it by delegating to accept
on
each of its elements.
We can now write code that finds the outgoing edges from each allocation
by simply applying a visitor to the allocation. I ended up writing two
visitors - one for the dfs
step and one for the mark
step.
I would have liked to make Collectable
object-safe, but that would
also require Visitor
to be object safe. That, in turn, would cause
every garbage collector to be coercible to Gc<dyn Collectable>
, which
would only be possible on nightly, making it impossible to write a
stable crate.
Bulk dropping
A cursory analysis of the implementation of Drop
above shows that the
call to is_accessible
runs in linear time with respect to the total
number of Gc
s existing. You would be excused for thinking that drop
,
as written, runs in linear time. However, the call to drop_in_place
could also result in another Gc
being dropped, yielding a
quadratic-time cleanup. This is unacceptable, especially since our first
call to is_accessible
actually did all the work of determining whether
the allocations are reachable from the original one - if the original
allocation being freed was inaccessible, any allocation referred to by
it should also be inaccessible.
We can save our time on marking by a pretty simple method: we’ll do one
more pass through the reference graph, recording a pointer to each
allocation to be destroyed and its implementation of drop
. Then, once
we’ve found every inaccessible allocation, we’ll drop them all in one
quick go.
There’s one final problem, though: each Gc
owned by a dropped value
could itself try to manipulate the reference counts of its pointees,
resulting in undefined behavior, or worse, slow code. However, all
problems in computer science can be solved by adding more state or more
indirection. We opt to solve it with extra state this time, creating a
thread-local value called COLLECTING
.
When we do a bulk-drop, we set COLLECTING
to true
. Then, when we
call drop
on a Gc
, it checks the state of COLLECTING
. If
COLLECTING
is true, the Gc
does nothing when dropped (not even
affecting its reference count). To handle outbound edges to
still-accessible allocations, we add one more traversal before the final
cleanup to handle outbound edges.
Let’s take a look at what the code for that roughly looks like now.
thread_local!
collect
is the function we’ll write to check if the allocation is
accessible and perform a bulk cleanup if needed. There’s one more
problem before we can implement collect
, though: our psuedocode
required us to be able to store each allocation as a key in counts
and
also to access the allocation’s neighbors. This means we have to use a
little bit of type erasure to achieve our ends. I won’t go into too much
detail, but we need to be able to do two things:
- Store pointers to allocations as keys in a hash-map.
- Reconstruct a stored pointer using type information to make it possible to do work.
Due to the limitations of Rust, these are actually mutually exclusive.
Because it’s undefined behavior (due to provenance issues) to compare
the raw data in pointers, we can’t erase a pointer and then use it as a
key in a hash-map. However, we can’t just get a pointer to the first
byte of the allocation, because our allocation could be ?Sized
.
As a result, my implementation actually has two different kinds of erased pointers: a thin pointer which can be used for comparisons and a fat pointer which can be used to reconstruct a pointer to any type.
; // used as a key in hash-maps
; // used as a reconstructible pointer to the allocation
Then, all the functions which need to know the type information of the
allocation (such as dfs
and mark
) can be stored opaquely as function
pointers, and when called, they can “rehydrate” the ErasedPtr
passed
in with type information.
Below, I’ve included an example of how it can be used to implement
dfs
.
unsafe Sized>
I won’t include the implementation of collect
here because it’s
lengthy and not even the final revision of the cleanup algorithm, but
you can trust me that it works. After much gnashing of teeth and tearing
of hair, this is finally enough to achieve bulk dropping, making calling
drop()
on a Gc
a linear-time operation.
Amortizing
What’s that, you say? You need your reference operations to work in
time? Why are programmers these days so picky?
Luckily, it isn’t too hard to make our garbage-collection efforts
terminate in average-case time. Instead of calling
collect
directly, we’ll just punt the allocation, which may be inaccessible,
into a set of “dirty” allocations. In my code, I call that set a
dumpster. Then, once every times an allocation is dropped, we’ll
go through the whole dumpster and collect them all in
time.
Amortized across all allocations, that’s technically
, enabled by
the same accounting wizardry that makes array-backed lists useful and
that caused the 2008 financial crisis.
To save ourselves a little bit of work, whenever an allocation is accessed (such as by dereferencing or cloning), we can yank the allocation away from that same dirty set since it was just proven to be accessible.
Are we concurrent yet?
So far, the garbage collector I’ve been outlining to you has been a single-threaded collector, which can only store thread-local garbage-collected values. I’m told that all code worth writing ends up being concurrent, so we have to do better than that. Lucky for us, this algorithm can be made to operate concurrently with little effort, assuming that only one thread collects at a time.
However, even if only one thread is collecting, we can still run into
some nasty concurrency issues. Let’s imagine that we have 2 allocations,
and
, plus the imaginary “root” allocation
, where
can
only be accessed from
and
can only be accessed from
.
Now, consider performing a depth-first search starting from . First,
we record that
has 1 unaccounted-for-reference in
counts
. Then, we
move on to looking at because it’s one of
’s children.
However, between when we record ’s reference count and examine
,
another malicious thread mutates
, adding a back-reference from
to
.
Then, the depth-first search will return to , decrement the number of
unaccounted references, and assume that all incoming references to
and
were totally accounted for (and therefore inaccessible). Then
the garbage collector would happily destroy
and
’s allocations,
leaving a dangling reference to the heap, potentially causing a
use-after-free or other gnarly memory errors.
Resolving concurrency
The simplest solution to this concurrency issue is to just wrap
everything in a big fat lock by forcing every single operation involving
a Gc
to acquire a single global mutex, preventing any sort of
concurrency bugs at all. However, that’s ridiculously slow.
We’d like to have an approach that only needs to achieve mutual
exclusion over a small subset of all allocations if at all possible. To
do so, let’s take a closer look at our concurrency problem: we can see
that we only really get erroneous results when another thread mutates
the subgraph as we interact with it. The solution is quite clever -
we tag every
Gc
with a special number, which I’ll call a collecting
tag.
First, we’ll add a global atomic integer called the COLLECTING_TAG
.
static COLLECTING_TAG: AtomicUsize = new;
Whenever we create or clone a new Gc
, we’ll annotate it with the
current value of COLLECTING_TAG
.
// this new `Gc` type is intended to be `Sync`.
Next, at the start of the collection process, we’ll increment the value
of COLLECTING_TAG
just before we call dfs
.
Lastly, whenever we encounter a Gc
during the dfs
process, we’ll
check its tag. If the tag is equal to the current value of
COLLECTING_TAG
, that means the Gc
was created after dfs
started.
If so, then whatever spot we found the Gc
in must have been
accessible, and we can mark it as so. To prevent us from visiting the
same edge twice (due to shenanigans), we’ll also update the tag on every
edge we visit to the value of COLLECTING_TAG
.
Let’s return to our broken example to show how this modification fixes
it. We return to our reference graph with pointing to
and
pointing to
. Since every
Gc
must be annotated with a tag, let’s
say they’re all tagged with 0.
Next, when dfs
begins, COLLECTING_TAG
is incremented to 1. As a
result, the new reference created by a malicious thread must be tagged
with 1.
Now, when dfs
visits the edge pointing from to
, it sees that
the reference is labeled with a tag of 1, and therefore both
and
must be accessible - no accidental early deallocations here.
Moving in and out
We might hope that tagging the pointers is sufficient to prevent all concurrency bugs. However, this is sadly not enough. To show why, we’ll begin with an example.
First, let’s imagine an allocation called , which contains two
optional references to itself.
At the start of our example, there will be only one Gc
pointing from
inside to itself, and the other slot that can hold a
Gc
will be
empty. is accessible directly from the root allocation
, and all
existing references are tagged with 0. For the sake of clarity, I’ll
draw the arrow representing the first slot in
with a solid line and
the second slot with a dashed line.
Now, let’s have dfs
begin, starting at .
-
We increment
COLLECTING_TAG
to 1. -
dfs
records thathas 2 unaccounted-for references at the start.
-
dfs
visits the’s first slot and tags the currently-existing self reference in that slot.
now has 1 unaccounted-for reference.
-
A malicious thread intervenes, first moving the
Gc
from’s first slot to
, then moving the original
Gc
which connectedto
into
’s second slot.
-
dfs
continues, now visiting the second slot in. Seeing another reference to
, we decrement the number of unaccounted-for references to 0.
-
dfs
found no ancestors ofwith unaccounted-for ancestors, so we free
. However,
is still accessible!
This problem stumped me for a decent while. It seems impossible to
prevent other threads from do-si-do-ing every Gc
pointing to an
allocation in order to fool our search. Luckily, there is a way through.
Since tagging every Gc
worked out for us before, we’ll try tagging
every allocation.
Next, every time a Gc
is dereferenced, we’ll update the tag in the
allocation to the current value of COLLECTING_TAG
.
Whenever dfs
observes that the tag on an allocation is equal to
COLLECTING_TAG
, that means that the allocation was accessed, and
therefore accessible. In order to shuffle references around in our above
example, the malicious thread would have to dereference a Gc
,
notifying the search and preventing an early deallocation.
In fact, this is enough to prove that dfs
will never incorrectly
mark an allocation as inaccessible, even under concurrent execution.
This blog post is already rather long, so I’ll only provide a rough
sketch of the proof.
Theorem: dfs
will never erroneously mark an allocation as
inaccessible.
Proof:
- If
dfs
encounters aGc
created afterdfs
begins, it will know that the referent is accessible. - Due to (1), if
dfs
marked some allocationas inaccessible, it must have only visited
Gc
s pointing towhich existed before
dfs
began. - Weakening the statement from (2), if
dfs
marked some allocationas inaccessible, it must have only visited
Gc
s pointing towhich existed when
’s reference count was recorded.
dfs
never visits aGc
while it points from the root allocationto any other allocation.
dfs
must visit everyGc
which pointed to an allocation at the point when its reference count was recorded in order to mark it as inaccessible.- Due to (4) and (5), if
is marked as inaccesssible, any
Gc
which at any point connectedto
must have been moved to connect another node (not
) to
.
- Due to (6), there is no lifetime
'x
which can borrow against aGc
pointing fromto
and last between when
dfs
first visitsand when it last visits
.
- The final
Gc
to move intobefore
dfs
must have been moved while another borrow against aGc
pointing fromto
existed. Let the lifetime of that borrow be
'y
. - Because
'y
lasts longer than the final move,'y
can be extended to last arbitrarily long, even until afterdfs
has finished visiting everyGc
pointing to.
- Due to (7),
'y
must not have existed beforedfs
first visited. Therefore, if
is erroneously marked as inaccessible, a
Gc
pointing tomust have been dereferenced to generate a borrow with lifetime
'y
. - If a
Gc
pointing towas dereferenced before the last time
dfs
visited aGc
pointing to, then
dfs
would have observed that’s tag had changed. Therefore, if an allocation was erroneously marked as inaccessible, it must have been marked as accessible. Therefore no allocations were ever erroneously marked as inacessible by
dfs
. QED.
This proof is relatively handwavy, but it should be possible for an
astute reader to fill in the details. I will leave it to such astute
readers to prove the complementary proof to this; namely, that dfs
will never erroneously mark an allocation as accessible.
I also think that it’s really cool that the entire mechanism of the correctness of this approach is totally supported by Rust’s borrow checking semantics. It makes the whole thing feel like it fits right in.
It’s the weak ones that count
There’s one more cause for memory safety issues from concurrency. If an allocation is freed because its reference count reached zero, it could still be accessible by another thread because it was stored in the global dumpster. We can’t guarantee that we’ve obliterated all references to that allocation from every other thread’s memory, so it seems like we’re hosed.
However, there’s no problem in computer science that can’t be solved by
adding more state. Taking a leaf from Rc
’s book and adapting it
liberally, we annotate each allocation with a “weak” reference count,
which is the number of times it’s referenced by a dumpster or cleanup
thread. We’ll rename our original reference count to be the strong count
instead.
Then, when dropping the last strong reference to a Gc
, we first check
that its weak reference count is nonzero before we can deallocate it. If
not, it’s the cleanup threads’ problem to take care of it.
Whenever an allocation is removed from a dumpster and about to be thrown away by a thread running a cleanup procedure, the thread decrements the weak count of the allocation and checks if both the weak and strong count have reached zero. If so, that thread can then safely destroy the allocation.
Deadlocked!
Imagine that we want to create a Gc<Mutex<()>>
, and then drop a Gc
pointing to it while the Mutex
is locked.
let gc1 = new;
let gc2 = gc1.clone;
let guard = gc1.lock?;
drop; // suppose this triggers a cleanup
drop;
While traversing the reference graph, the dfs
visitor would attempt to
visit the mutex, lock it, and try to carry on. However, since the same
thread that triggered the allocation still holds a guard to that mutex,
the thread would deadlock disastrously. Lucky for us, we know that any
allocation with a held mutex guard must still be accessible (due to
Mutex
’s excellent API), so we can be certain that if we fail to
acquire a mutex, we have immediate proof that an allocation is
accessible.
To take advantage of that, we’ll make visiting a collectable type a fallible operation, and make it so that failing to acquire a lock also fails the visit.
Sharing your dumpsters
You’ve made it this far into this (admittedly rather lengthy) post. Enjoy this hand-drawn picture by yours truly.
When implementing this garbage collector, one of the biggest bottlenecks
with concurrent performance is accessing the one, global dumpster every
single time a Gc
is dropped. If using a Mutex<HashMap>
, this will
remove nearly all parallelism from the whole system and make you wonder
why you paid all that money for a 16-core computer.
As a result, any sane person might lean for a concurrent hashmap. There
are many Rust implementations - I tried
chashmap
but gave up
on it due to a rare panic
bug, and then
I tried dashmap
, which worked
fine but was too slow as thread counts increased.
The heart of the issue is that concurrently updating any data structure is going to be slow. Even so-called “concurrent” data structures which use clever sharded locks have this issue - there’s always some contention overhead. The best workaround to concurrency slowdowns is to just not do things concurrently. We can fix this by making every thread have its own dumpster, and then having a global collection of to-be-cleaned allocations which I’ll call the garbage truck. Whenever another allocation is marked dirty, we can check if this thread’s dumpster is full according to some heuristic, and if so, transfer all the allocations from this thread’s dumpster to the garbage truck. I’ve included some pseudocode for the whole process below.
= # local to this thread
= # global to all threads
=
If is_full
returns true
rarely enough, there will be almost no
contention over the garbage truck. This means that the concurrency
overhead is vastly reduced and we get to pocket some great performance
returns.
Okay, but how fast is it?
In short, this garbage collector runs pretty fast!
In the end, I made two different garbage collector implementations - one
which is thread-local, saving us from all the concurrency headaches, and
another thread-safe one that has all the bells and whistles at the cost
of some performance. Borrowing from
once_cell
’s API, the former
lives in the unsync
module of the crate and the latter in sync
.
Next, I collected all the garbage collectors I could find in order to
compare them. I selected all the ones which had a similar API to
dumpster
, were published on crates.io, and
actually worked.
Name | Concurrent? | Like dumpster ? |
---|---|---|
dumpster (unsync) | ❌ | ✅ |
dumpster (sync) | ✅ | ✅ |
Rc | ❌ | ✅ |
Arc | ✅ | ✅ |
bacon-rajan-cc | ❌ | ✅ |
cactusref | ❌ | ❌ |
elise | ✅ | ❌ |
gc | ❌ | ✅ |
gcmodule | yes, but with a different API | ✅ |
rcgc | ❌ | ❌ |
shredder | ✅ | ✅ |
I excluded the ones that didn’t have a dumpster
-like API because it
would have been a pain to set up a benchmarking harness to handle them.
If you feel this is poor diligence, you can go benchmark them yourself.
I also included Rc
and Arc
, even though they don’t collect cycles,
because they’re a decent baseline for the minimum performance of a
garbage collector. During my research, I also found that
bacon-rajan-cc
doesn’t initiate collections of its own accord, so I
added a version of each of dumpster
’s garbage collectors which doesn’t
automatically trigger collections in the spirit of competitive fairness.
Those versions are labeled as “manual.” Lastly, I found that gcmodule
simply stack-overflows and sometimes segfaults when facing extremely
large collection loads, so I removed it from the benchmark set.
As my benchmark, I decided to do the very simplest thing I could think of: have a bunch of allocations randomly add and remove references to each other in a loop, dropping and creating new allocations as well. To account for random noise, I ran 100 trials of each garbage collector doing 1 million operations. Each operation is either the creation of a reference or a deletion.
First, I ran a benchmark of all the garbage collectors, measuring pure single-threaded performance. I’ve shown my benchmark results as a set of violin plots below, relating runtime to frequency.
This plot doesn’t tell us much other than that shredder
is slow. To
take a closer look, let’s remove shredder
from the list and see what
we find.
dumpster
’s unsync implementation is in a decent position, beat out
only by the non-cycle-detecting allocators and bacon-rajan-cc
.
bacon-rajan-cc
has very little overhead and is quite performant.
I think there are two main reasons why the concurrent version of the
garbage collector is much slower than the single-threaded one. First,
moving allocations from a thread-local dumpster to the global garbage
truck takes some time. Second, the dfs
operation in a concurrent
environment must record the current set of children of an allocation,
typically in a separate heap-allocated structure, to prevent some
concurrency bugs. It’s certainly possible to mitigate the losses from
the first issue, but the second is harder to handle.
My guess as to why the concurrent version is significantly slower when using automatic collection triggering is that it’s a consequence of my testing harness. Currently, it runs all the tests in a loop, so the single-threaded benchmark could run directly after the multi-threaded one and be forced to clean up some of its garbage.
In any event, let’s move on to the results from the multi-threaded benchmark.
The only other concurrent garbage collector available for me to test
against was shredder
, and it looks like it wasn’t much of a
competition.
dumpster
doesn’t scale as smoothly as I’d like with thread count, but
it’s a relief that performance does actually improve with thread count,
except for the jump from one thread to two. I think being only 3x slower
than Arc is a decent victory, especially for a project that I wasn’t
paid any money to do. I’m not much of an expert at high-performance
computing in concurrent situations, so trying to optimize this further
would likely just be an adventure in trying random things and seeing if
they yield an improvement. If anyone out there has clever ideas on how
to optimize this approach, let me know! My email is on the “About” page.
Coda
Over the last few months, I built a garbage collector for Rust from scratch. I implemented a novel (ish) algorithm of my own design, got it to work in both single-threaded and multi-threaded contexts, and showed that it was actually pretty performant.
The implementation of this crate is now up on crates.io and available for anyone to download. It’s licensed under the GNU GPLv3.
Special thanks to Wisha, Charlie, and Shreyas for reviewing this post, fielding my bad questions, and putting up with me rambling about this for nearly two months straight.
Postscript
I have a few things that I thought were interesting along the way but didn’t quite have a good spot to include in the story above, so they’ll all be mish-mashed in here.
Ode to bad good ideas
The final, polished solution you see in this blog post is actually quite far from my original idea. In particular, almost all of my first ideas for handling concurrency didn’t work. Here’s a quick roll of them all:
- Wrap a
Mutex
around every single allocation’s reference count and hold them while doingdfs
- Save on space in
Gc
s by tagging only the lowest bit instead of adding an entireusize
- Make everything enable dynamic dispatch so we can have real Java-like objects
- Clean up allocations locally on every thread
- Avoid directly dropping allocations and instead cleverly decrement their reference count
Cool nightly tricks
Rc
and the other pointer types are somewhat “blessed” in that they can
access nightly features from the Rust compiler and still be consumed by
stable projects. dumpster
has no such blessings.
I would like to implement CoerceUnsized
unconditionally for all my
garbage collectors, but that’s simply not possible on stable yet.
Likewise, pending the implementation of strict provenance, there’s actually no way for a stable crate to manipulate the address portion of a fat pointer. This means that even if I found some space optimization using lower bit tagging, I wouldn’t be able to use it because it’s gated behind a nightly feature.
Another path to Gc<dyn T>
Without the ability to coerce Gc
as an unsized type, I can’t make
Collectable
object safe due to the function signature of accept
,
which forces Visitor
to also be object-safe, forcing visit
operations to also be object-safe.
I’ve thought of a slightly different approach, though. We can make
Visitor
a single concrete type, and that makes Collectable
trivially
object-safe. The downside here is that I would then have to store all 5
of my current visitor implementations in the same structure, and that
loses a lot of my nice type safety features and adds a large runtime
penalty. I’m not very happy with that idea, though.
Miri rules
I would probably not have been able to finish this project without
cargo miri
. If you haven’t heard
of it, you should download it right now and run all of your Rust
projects with it. It’s such a fantastic tool for debugging unsafe code
that I wish I could have it in other languages, too - valgrind is a poor
substitute for it when using C.
Testing and debugging
Early on, I had just a few handwritten tests. The first reference graph that I found that gave me trouble looked like this:
However, once I had simple graphs with multiple cycles down, it was hard to come up with new tests. As a result, I resorted to fuzzing, using the same approach as my benchmark - randomly generating references and then seeing if they got cleaned up properly.
Debugging the results of those fuzzed tests absolutely sucked, though. The experience of trying to find the source of an issue was awful, so I ended up having to make some (crude) tools to make the process easier. Eventually, I had settled on printing out the reference graph at choice times during the run of my tests and then visualizing them to get a good guess of what was wrong. However, when making the concurrent implementation, printing something can change the execution behavior of the program. Not only that, it’s not always obvious what the issue is. I had one bug which only occurred after several thousand operations had completed, resulting in reference graphs that looked like this:
The graph above was made by fuzzing my garbage collector, using the same approach as my benchmarks.