Watching and Understanding the Ruby 2.1 Garbage Collector at Work

12 Mar 2014

The most common way to check up on Ruby’s Garbage Collector (GC) is probably calling GC.stat, which returns a hash of of information about the current state of the GC. Since version 2.1 Ruby comes with a generational GC and the output now contains a lot more information than in previous version. Let’s have a look at its output:

require 'pp'

# Ruby 1.9.3-p484
pp GC.stat
{:count=>24,
 :heap_used=>138,
 :heap_length=>138,
 :heap_increment=>0,
 :heap_live_num=>31875,
 :heap_free_num=>24540,
 :heap_final_num=>0}

# Ruby 2.0.0-p353
pp GC.stat
{:count=>6,
 :heap_used=>138,
 :heap_length=>138,
 :heap_increment=>0,
 :heap_live_num=>26842,
 :heap_free_num=>39883,
 :heap_final_num=>0,
 :total_allocated_object=>124292,
 :total_freed_object=>97450}

# Ruby 2.1.1
pp GC.stat
{:count=>20,
 :heap_used=>126,
 :heap_length=>132,
 :heap_increment=>0,
 :heap_live_slot=>51007,
 :heap_free_slot=>349,
 :heap_final_slot=>0,
 :heap_swept_slot=>15942,
 :heap_eden_page_length=>126,
 :heap_tomb_page_length=>0,
 :total_allocated_object=>392989,
 :total_freed_object=>341982,
 :malloc_increase=>221360,
 :malloc_limit=>16777216,
 :minor_gc_count=>17,
 :major_gc_count=>3,
 :remembered_shady_object=>388,
 :remembered_shady_object_limit=>476,
 :old_object=>30313,
 :old_object_limit=>37264,
 :oldmalloc_increase=>600640,
 :oldmalloc_limit=>16777216}

As we can see, the output of GC.stat in Ruby 2.1.1 contains a lot more information than in previous versions. But, to be honest, it had little to no meaning to me until I had a better grasp on how Ruby manages memory and how the GC in Ruby 2.1 works. So let’s have a look at these topics and let me show you what I found out about them before coming back to the specifics of GC.stat.

Ruby’s Memory Management and Garbage Collector

Disclaimer: I am not an expert on Garbage Collection nor on Ruby’s internals. What follows is what I learned by reading about Ruby’s memory management, its garbage collector, garbage collection in general and by digging through the Ruby source code (especially gc.c). Please correct me if anything is wrong. I’m happy about feedback and eager to learn more.

The first thing we need to know about Ruby’s memory management is that it exists. Ruby manages the memory and not the programmer. There is no way, at least none that I know of, to manually allocate memory as in C with malloc(2). Ruby does that for us. It asks the kernel for memory, it initializes the memory and we, as programmers, do not have to care about it.

The memory Ruby manages and to which we have access is organized on the Ruby heap, which itself is split up into slots. Each slot is big enough to hold one Ruby object. In the context of Ruby’s memory management and GC an object is represented as a simple struct called RVALUE. Each slot on the Ruby heap has the size of one RVALUE, which is 40 bytes.

GC::INTERNAL_CONSTANTS[:RVALUE_SIZE]
# => 40

Now, whenever we create a new object Ruby doesn’t immediately ask the kernel for 40 bytes of memory. Instead Ruby already has a large enough heap to hold a lot of objects. Whenever we do something like this:

foobar = MyFoobar.new

Ruby doesn’t need to allocate more memory, at least in the majority of cases. Chances are Ruby still has free slots on its own heap to hold our newly created object. Ruby manages its memory in such a way that it doesn’t need to ask the Kernel for new memory every time a new instance of an object is created.

But eventually, after creating enough objects, there are no free slots left and this is the point where the GC kicks in. The GC checks which of the slots/objects are not being referenced by other objects anymore and frees them. Freed slots are then ready to be used again as slots for newly initialized objects. If there are still not enough slots for new objects, even after freeing unreferenced slots and making them available again, only then does Ruby ask the kernel for more memory.

That’s the basic and super simplified model of how Ruby manages memory and we need to keep that in the back of our heads when we now take a closer look at the GC in Ruby 2.1.

Unlike in previous versions the GC in Ruby 2.1 is a Generational GC that uses a mark-and-sweep strategy to maintain the Ruby heap. The implementation of the current GC in Ruby is called RGenGC and was developed by Koichi Sasada as part of the Ruby Core team at Heroku.

In a simple world without implementation details and edge cases the basic strategy behind a mark-and-sweep garbage collector is to traverse the object graph, the Ruby heap in our case, and check which objects are still in use and which ones are not. The unused objects are being freed, making their memory available to us again, and the used objects are kept where they are.

Starting at a object that is known to be referenced the GC traverses along every reference to the other objects on the heap. Every time the GC comes across an object, which means that the object is still being referenced, since the GC traverses along references, it marks the object (by switching a bit on its underlying structure, for example) and moves on until it can find no more references.

After the mark-phase comes the sweep-phase: the GC goes through the whole heap again and “sweeps” away every object that is not marked and frees it. Marked objects are being unmarked, enabling the next cycle of garbage collection.

A generational GC which uses mark-and-sweep (a generational GC does not necessarily have to use a mark-and-sweep algorithm) works basically in the same way, but implements some other ideas in order to speed up the traversal of objects.

The main assumption behind a generational GC is this one: most objects die young. In more words: it is much more likely that young objects (created since the last GC run) are referencing old objects (which survived the last GC run) than the other way around. Only a small amount of new objects need to be marked and not swept way, which are the objects that are being referenced by old objects.

Based on that the GC can save itself a lot of time and useless work when it concentrates on the young objects for the majority of its time, since these are the ones that most likely need to be collected, thus yielding the greatest benefit for a GC run. Traversing old objects would not result in the same amount of freed memory.

So, a generational GC needs two modes. In Ruby’s case they are called minor and major GC runs. In a minor GC run the GC only traverses the young objects and in a major GC run it traverses the whole object graph, including the old generation. A minor GC should be faster than a major GC and is typically run more often.

In order to classify objects as new or old the GC does the following: whenever it marks an object in a mark-phase (which means that the object will survive this GC run) it promotes it to the old generation. Unmarked objects are swept away.

In the next minor GC run the GC can now ignore the old generation and only traverse the young generation.

But there is one problem: Imagine an old object starts referencing a new object in between minor GC runs. In its next run the GC (since it traverses from object to object along their references) will not “mark” this newly created object, since it ignored the old generation and their references and never came across this new object.

The fix to this problem involves a “remember set” and a “write-barrier”. And here is how it works: around every object is put up a write-barrier, through which every write operation on the object has to pass. And the important part is this: adding a reference to an object is also a write operation.

This snippet of C code should clarify how adding a reference is a write-operation:

old_array[0] = &new_object

&new_object is the memory address of new_object and that line simply says: “Save (write) the memory address of new_object as the first element of old_array”.

With a write-barrier around objects the GC can now detect references from old objects to newly created objects. Whenever it does that, it adds a reference to the old object to the remember set.

Without the remember set the GC previously skipped the old generation in minor GC runs and failed to “mark” new objects that are being referenced by old objects. But with the write-barrier and the remember set the GC can now traverse the young generation AND the old objects in the remembered set and not miss references from old to new objects.

That’s the theory. Now let’s switch back to the real world, you know, the one with implementation details and legacy code.

When trying to implement a generational GC for Ruby the developers came across a problem with the write-barrier. It is a huge undertaking to put up effective write-barriers around objects while Ruby C extensions have low-level access to their memory addresses. A possible solution are write-barriers on the C level (e.g. in the form of macros for pointer access) but that not only entails rewriting the internal C API used by Ruby itself but also means that a lot of C extensions would need to be rewritten to make use of the new API or be deprecated. So they came up with a better solution: shady objects.

At the time of creation an object in Ruby 2.1 is either classified as sunny or shady. Sunny are objects protected by write-barriers and shady objects are not.

With the distinction between shady and sunny objects at hand the minor GC run of RGenGC now works slightly different compared to the “simple” theory described above. Whenever the GC comes across a shady object while traversing it “marks” but does not promote it to the old generation. The reason for this new behavior is the missing write-barrier. Promoting a shady object to the old generation would result in missing references from old to new objects. And the whole purpose of shady objects is to not miss those references in the first place. Instead the GC checks if the shady object is referenced by an old object and if that is the case it adds the shady object to the remember set.

An object that was created as sunny doesn’t have to stay sunny. The GC can also “shade” objects. Shading an object means turning them from sunny to shady objects, demoting them from the old generation, and adding them to the remember set. When does that happen? Whenever low-level access to the memory address of the object is gained through the C API, which the Ruby Virtual Machine can detect. After the user of the C API has the pointer to the object an effective write-barrier is not possible anymore, which would result in missing references from old to new objects, so the object gets “shaded”.

Instead of solely relying on write-barriers around all objects to detect references, RGenGC adds shady objects to the remember set when it can’t tell whether that object is referencing a new object.

In its next run the GC can now traverse the young generation AND objects in the remember set. The set now contains old objects referencing new objects and shady objects referenced by old objects. Traversing through this set the GC should not miss to mark newly created objects.

And that is basically how RGenGC in Ruby 2.1 works. Now, that was a lot to take in, even though this is a really simplified description of the implementation. But it helps us tremendously when we now go back to the output of GC.stat. You’ll see, it will make more sense now.

Analyzing GC.stat

Here is the output of GC.stat again, straight from a fresh irb process, so we don’t have to scroll back up again. I also reordered it, making it easier to explain what each key and value means.

{
  :count => 20,
  :minor_gc_count => 17,
  :major_gc_count => 3,

  :total_allocated_object => 393977,
  :total_freed_object => 342796,

  :heap_length => 131,
  :heap_used => 126,
  :heap_eden_page_length => 126,
  :heap_tomb_page_length => 0,

  :heap_live_slot => 51181,
  :heap_free_slot => 175,
  :heap_final_slot => 0,
  :heap_swept_slot => 16742,
  :heap_increment => 0,

  :remembered_shady_object => 388,
  :remembered_shady_object_limit => 476,
  :old_object => 30318,
  :old_object_limit => 37262,

  :malloc_increase => 230416,
  :malloc_limit => 16777216,
  :oldmalloc_increase => 596304,
  :oldmalloc_limit => 16777216
}

Now, with our knowledge about Ruby’s generational Garbage Collector and its memory management, let’s go through the output and see what each line means.

`GC.stat[:count]`

This one is pretty much self-explanatory. :count is the number of GC runs, major and minor combined.

`GC.stat[:minor_gc_count]`

The number of GC runs that only traversed the young generation of objects and the objects in the remember set.

`GC.stat[:major_gc_count]`

The number of GC runs that traversed the whole Ruby heap, including old, young and remembered objects.

`GC.stat[:total_allocated_object]`

The total number of objects Ruby has allocated in the lifetime of the current process.

`GC.stat[:total_freed_object]`

Number of freed objects in the lifetime of the current process.

`GC.stat[:heap_length]`

There’s something I didn’t mention before: Ruby’s heap is not only organized in slots, where each slot holds an object, but also into pages. A Ruby heap page holds a specific number of slots.

GC::INTERNAL_CONSTANTS[:HEAP_OBJ_LIMIT]
# => 408

Each Ruby heap page holds 408 slots (further up we saw that the size of one slot is 40 bytes, which means that the page size is around 16kb). GC.stat[:heap_length] returns the number of pages the current Ruby process has allocated. Remember: allocated memory here does not mean that it is in use, since Ruby manages the memory for us and may hold the allocated memory back for when times are tough and we run out of memory and so on, which leads us to…

`GC.stat[:heap_used]`

This is the number of heap pages that are currently in use. Either filled with live objects or free slots.

`GC.stat[:heap_eden_page_length]`

Ruby separates its heap into “Eden” and “Tomb”. Eden is the part of the heap where pages reside that contain (at least one) live objects. Tomb are only pages that contain no live objects but are there to be used when Eden runs out of space. :heap_eden_page_length is the number of pages in the “Eden” part of the Ruby heap.

`GC.stat[:heap_tomb_page_length]`

The counterpart of :heap_eden_page_length. This is the number of pages that do not contain live objects. Since the Ruby heap is divided into “Eden” and “Tomb”, together they make up the heap: the sum of :heap_tomb_page_length and :heap_eden_page_length equals :heap_used.

`GC.stat[:heap_live_slot]`

The number of objects that survived all the GC runs in the past and are still alive. We can calculate this number ourselves:

GC.stat[:total_allocated_object] - GC.stat[:total_freed_object]

Also: GC.stat[:heap_live_slot] / 408 returns roughly GC.stat[:heap_used]

`GC.stat[:heap_free_slot]`

Number of allocated but unused/free slots on the Ruby heap.

`GC.stat[:heap_final_slot]`

I am not too sure about this number but after reading through Ruby’s gc.c my best guess is that it is the number of slots that have a finalizer that still needs to be run, which makes the Ruby VM consider the slot as still being used. The relevant piece of code in gc.c that gave me this idea are these 5 lines:

static size_t
objspace_free_slot(rb_objspace_t *objspace)
{
    return objspace_total_slot(objspace) - (objspace_live_slot(objspace) - heap_pages_final_slots);
}

We can add a finalizer to an object like this:

a = 'mystring'
ObjectSpace.define_finalizer(a, proc {|id|  puts "Object with ID #{id} destroyed." })
GC.start(full_mark: true, immediate_sweep: false)
# outputs: "Object with ID 70170672470620 destroyed."

When the object is swept away by the GC the finalizer is run. And I’ve played around with that a lot but I couldn’t get GC.stat[:heap_final_slot] to return something other than 0. I’m happy about feedback and suggestions here.

`GC.stat[:heap_swept_slot]`

This gets reset before every page sweep (Ruby sweeps heap pages one by one) to zero. After the sweep it gets incremented by the number of freed and already empty slots in the swept page.

`GC.stat[:heap_increment]`

The number of pages that get added to the Ruby heap if it needs expanding. This number is dynamically adjusted whenever the Ruby heap needs to grow with this formula:

heap_increment = (current_number_of_slots * factor) - current_number_of_slots

The factor is 1.8 by default.

`GC.stat[:remembered_shady_object]`

Number of shady objects in the remember set. This gets reset with every major GC run.

`GC.stat[:remembered_shady_object_limit]`

If remembered_shady_object crosses this limit a major GC run is triggered. The number is dynamically adjusted after each major GC run with this formula:

remembered_shady_object_limit = factor * remembered_shady_object

The factor is 2.0 by default.

`GC.stat[:old_object]`

The number of old generation objects.

`GC.stat[:old_object_limit]`

If old_object crosses this limit a major GC run is triggered. It is dynamically adjusted in the same way that remembered_shady_object_limit is, using the same factor and formula.

`GC.stat[:malloc_increase]`

Not every object fits into a 40 byte Ruby slots and needs more memory (e.g. long strings). Objects that need more memory can get it with Ruby’s internal wrapper of malloc(2) and use it as their own buffer. Whenever the wrapper is called, :malloc_increase gets incremented by the number of newly allocated memory. And whenever that memory is freed the size of :malloc_increase is reduced by the size of the freed memory. So :malloc_increase basically reflects the current size of memory allocated besides Ruby heap slots.

`GC.stat[:malloc_limit]`

If :malloc_increase crosses this limit a minor GC is triggered. It is dynamically adjusted before every sweep with this formula:

malloc_limit = factor * malloc_increase

The factor is 1.4 by default.

`GC.stat[:oldmalloc_increase]`

This is the old generation counterpart of :malloc_increase: the size of currently additional memory allocated by old objects.

`GC.stat[:oldmalloc_limit]`

If :oldmalloc_increase crosses this limit a major GC is triggered. It is dynamically adjusted before every sweep with this formula:

oldmalloc_limit = factor * oldmalloc_increase

The factor is 1.2 by default.

It does make more sense now, doesn’t it? But watching and understanding the Ruby GC normally goes hand in hand with some tuning experiments which might make this whole matter easier to grasp. So let’s have a look at some environment variables.

Environment variables

Ruby allows users to tweak its GC via environment variables. And since 2.1 there are a lot more variables to use than in previous versions. Ruby 2.1.1 even has one more than 2.1. The following is a list of GC tuning variables straight from Ruby 2.1.1’s gc.c:

RUBY_GC_HEAP_INIT_SLOTS
RUBY_GC_HEAP_FREE_SLOTS
RUBY_GC_HEAP_GROWTH_FACTOR
RUBY_GC_HEAP_GROWTH_MAX_SLOTS
RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR
RUBY_GC_MALLOC_LIMIT
RUBY_GC_MALLOC_LIMIT_MAX
RUBY_GC_MALLOC_LIMIT_GROWTH_FACTOR
RUBY_GC_OLDMALLOC_LIMIT
RUBY_GC_OLDMALLOC_LIMIT_MAX
RUBY_GC_OLDMALLOC_LIMIT_GROWTH_FACTOR

Again, let’s go through each one and see how settings it effects the GC.

`RUBY_GC_HEAP_INIT_SLOTS`

The initial number of slots Ruby allocates on its heap. Default value is 10000. Increasing this number to cover the live objects after a process is fully booted can reduce the number of GC runs when booting and thus the boot time:

$ ruby -e 'puts GC.stat[:count]'
5
$ RUBY_GC_HEAP_INIT_SLOTS=20000 ruby -e 'puts GC.stat[:count]'
3
$ RUBY_GC_HEAP_INIT_SLOTS=100000 ruby -e 'puts GC.stat[:count]'
1

Keep in mind that allocating memory takes time too.

`RUBY_GC_HEAP_FREE_SLOTS`

The minimum number of free slots that should be available after a GC run. Default value is 4096.

`RUBY_GC_HEAP_GROWTH_FACTOR`

The factor by which the size of the heap grows when it needs to be expanded. Default value is 1.8. This has a direct influence on GC.stat[:heap_increment] since it is the relevant factor for heap resizing.

$ RUBY_GC_HEAP_GROWTH_FACTOR=1.8 ruby -e 'GC.disable; 100_000.times { a = "foobar" }; puts GC.stat[:heap_increment]'
116
$ RUBY_GC_HEAP_GROWTH_FACTOR=1.85 ruby -e 'GC.disable; 100_000.times { a = "foobar" }; puts GC.stat[:heap_increment]'
176
$ RUBY_GC_HEAP_GROWTH_FACTOR=1.9 ruby -e 'GC.disable; 100_000.times { a = "foobar" }; puts GC.stat[:heap_increment]'
205
$ RUBY_GC_HEAP_GROWTH_FACTOR=1.95 ruby -e 'GC.disable; 100_000.times { a = "foobar" }; puts GC.stat[:heap_increment]'
275

Increasing this number increases the size of the Ruby heap and a larger heap triggers less GC runs (since Ruby won’t run out of memory so fast):

$ RUBY_GC_HEAP_GROWTH_FACTOR=1.3 ruby -e '100_000.times { a = "foobar" }; puts GC.stat[:count]'
13
$ RUBY_GC_HEAP_GROWTH_FACTOR=1.8 ruby -e '100_000.times { a = "foobar" }; puts GC.stat[:count]'
10
$ RUBY_GC_HEAP_GROWTH_FACTOR=1.99 ruby -e '100_000.times { a = "foobar" }; puts GC.stat[:count]'
8

But it does make sense to lower this number when RUBY_GC_HEAP_INIT_SLOTS is already really high, since the memory consumption might be go through the roof.

`RUBY_GC_HEAP_GROWTH_MAX_SLOTS`

The maximum number of slots on the Ruby heap. Default value is 0, which means the feature is disabled. But if the number is higher than 0 it sets the maximum number of slots Ruby is allowed to add to its heap at once.

The higher the number the less GC runs does Ruby need for its heap to grow to the needed size:

$ RUBY_GC_HEAP_GROWTH_MAX_SLOTS=1000 ruby -e '100_000.times { a = "foobar" }; puts GC.stat[:count]'
24
$ RUBY_GC_HEAP_GROWTH_MAX_SLOTS=2000 ruby -e '100_000.times { a = "foobar" }; puts GC.stat[:count]'
19
$ RUBY_GC_HEAP_GROWTH_MAX_SLOTS=20000 ruby -e '100_000.times { a = "foobar" }; puts GC.stat[:count]'
10
$ RUBY_GC_HEAP_GROWTH_MAX_SLOTS=50000 ruby -e '100_000.times { a = "foobar" }; puts GC.stat[:count]'
10

If it is not set Ruby uses the heap growth factor to determine by how much to grow the heap.

`RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR`

The factor by which Ruby increases GC.stat[:oldobject_limit] and GC.stat[:remembered_shady_object_limit] after each major GC. Default value is 2.0.

`RUBY_GC_MALLOC_LIMIT`

The minimum value for GC.stat[:malloc_limit], which causes a minor GC when triggered. Default value is 16777216 (16mb).

`RUBY_GC_MALLOC_LIMIT_MAX`

GC.stat[:malloc_limit] changes dynamically through a growth factor. This sets its maximum value. Default value is 33554432 (32mb).

`RUBY_GC_MALLOC_LIMIT_GROWTH_FACTOR`

The growth factor by which GC.stat[:malloc_limit] is resized before every sweep. Default value is 1.4.

`RUBY_GC_OLDMALLOC_LIMIT`

The old generation counterpart to RUBY_GC_MALLOC_LIMIT: it is the minimum value for GC.stat[:oldmalloc_limit], which causes a major GC when triggered. Default value is 16777216 (16mb).

`RUBY_GC_OLDMALLOC_LIMIT_MAX`

The maximum value GC.stat[:oldmalloc_limit] can have. Default value is 33554432 (32mb).

`RUBY_GC_OLDMALLOC_LIMIT_GROWTH_FACTOR`

The growth factor by which GC.stat[:oldmalloc_limit] is resized before every sweep. Default value is 1.2.

Words of Caution

And that’s it. Please keep in mind that the output of GC.stat and the GC environment variables are easily a subject to change while the Ruby core team is working on the GC. Ruby 2.2 is supposed to have a three-generation GC, which means that at least GC.stat will probably change.

Advice, recommendations, corrections or thanks are welcome! Leave a comment, send me an email to me@thorstenball.com or ping me on Twitter @thorstenball.

Essential Resources

If you want to learn more about Ruby’s GC and memory management here is a list of resources I found invaluable while researching for this blog post:

Slides for Koichi Sasada’s presentation about RGenGC at RubyConf 2013
Koichi Sasada’s strategy paper about implementing a generational GC in Ruby 2.1
Aman Gupta’s excellent blog post about RGenGC in Ruby 2.1
Ruby 2.1.1 - gc.c
Pat Shaughnessy’s blog posts about GC in Ruby: Part 1 and Part 2

Follow me on twitter: @thorstenball. Or send me an email to me@thorstenball.com. Or check out my books at interpreterbook.com and compilerbook.com.

I also write a weekly newsletter called Register Spill. Read it and sign up at registerspill.thorstenball.com.