Unicorn Unix Magic Tricks

20 Nov 2014

This post is based on the talk of the same name I gave at the Arrrrcamp conference in Ghent, Belgium on October 2nd, 2014. You can find the slides here and the video recording here.

Unicorn is a webserver written in Ruby for Rails and Rack applications. When I first used it I was amazed. This is magic, I thought. It had to be. Why?

Well, first of all: the master-worker architecture. Unicorn uses one master process to manage a lot of worker processes. When you tell Unicorn to use 16 worker processes it does so, just like that. And now you’re looking at 17 processes when you run ps aux | grep unicorn — each with a different name, showing whether its the master process or one of the worker processes, which even have their own number in their process names.

$ pstree | grep unicorn
 \-+= 27185 mrnugget unicorn master -c simple_unicorn_config.rb -l0.0.0.0:8080
   |--- 27210 mrnugget unicorn worker[0] -c simple_unicorn_config.rb -l0.0.0.0:8080
   |--- 27211 mrnugget unicorn worker[1] -c simple_unicorn_config.rb -l0.0.0.0:8080
   |--- 27212 mrnugget unicorn worker[2] -c simple_unicorn_config.rb -l0.0.0.0:8080
   |--- 27213 mrnugget unicorn worker[3] -c simple_unicorn_config.rb -l0.0.0.0:8080
   |--- 27214 mrnugget unicorn worker[4] -c simple_unicorn_config.rb -l0.0.0.0:8080
   |--- 27215 mrnugget unicorn worker[5] -c simple_unicorn_config.rb -l0.0.0.0:8080
   |--- 27216 mrnugget unicorn worker[6] -c simple_unicorn_config.rb -l0.0.0.0:8080
   |--- 27217 mrnugget unicorn worker[7] -c simple_unicorn_config.rb -l0.0.0.0:8080
   |--- 27218 mrnugget unicorn worker[8] -c simple_unicorn_config.rb -l0.0.0.0:8080
   |--- 27219 mrnugget unicorn worker[9] -c simple_unicorn_config.rb -l0.0.0.0:8080
   |--- 27220 mrnugget unicorn worker[10] -c simple_unicorn_config.rb -l0.0.0.0:8080
   |--- 27221 mrnugget unicorn worker[11] -c simple_unicorn_config.rb -l0.0.0.0:8080
   |--- 27222 mrnugget unicorn worker[12] -c simple_unicorn_config.rb -l0.0.0.0:8080
   |--- 27223 mrnugget unicorn worker[13] -c simple_unicorn_config.rb -l0.0.0.0:8080
   |--- 27224 mrnugget unicorn worker[14] -c simple_unicorn_config.rb -l0.0.0.0:8080
   \--- 27225 mrnugget unicorn worker[15] -c simple_unicorn_config.rb -l0.0.0.0:8080

How would one build something like this? I had no idea.

And then there’s a feature called “hot reload”, which means that you can tell Unicorn, while it’s running, to spin up a new version of your application. As soon as you do, Unicorn starts a new master process, which is going to serve the new version of your application. All the while the old master process is still running, responding to requests with your old application. Of course, the old master now has “old” in its name. Now, as soon as the new master process is fully booted up, you can send a QUIT signal to the old master process, which will in turn shut down and let the new one take over. And just like that you’ve switched to a new version of your application — without any downtime at all.

Oh, and Unicorn uses a lot more than the QUIT signal! There are tons of signals you can send to it: TTIN to increase the number of workers, TTOU to decrease it, USR1 to rotate the log files, USR2 to perform hot reloading, HUP to re-evaluate the configuration file. I didn’t know half of these signal names and there were even more in Unicorn’s own SIGNALS file.

And then there’s “preloading”: a feature of Unicorn that allows you to spin up new worker processes in less than a second, a fraction of the time it takes to boot up my Rails application. Somehow Unicorn is able to preload my application in memory and make use of that when creating new worker processes. And I had no idea how that works! Not a clue! And as if that wasn’t enough I discovered that Unicorn even has a file called PHILOSOPHY in its repository. Who else has that?! I was sure that there was some black magic going on. Because: how could Unicorn work like it does without magic?

Unix

After my first encounter with Unicorn I learned quite a bit about Unix systems and after a while I came back to Unicorn — still in amazement. But this time I read through the source code and it turns out, that, well, the secret ingredient to Unicorn is not magic but plain, old Unix.

Now, most people know Unix from a “user’s perspective”: the command line, shells, pipes, redirection, the kill command, scripting, text files and so on. But there’s this whole other side of Unix, too, which we could call the “developer’s perspective” now. From this side of Unix you can see signal handling, inter-process communication, usage of pipes without the |-character, system calls and whole lot more.

In what follows we’re going to have a look at Unicorn. We’ll take it apart and see that it’s just using some basic Unix tricks, the ones you can use as a developer, to do its work. The way we’re going to do that is by going through some of these Unix tricks, basic building blocks of every Unix system, and see how they work and how Unicorn uses them.

At the end we’ll go back to the “magic” of the beginning: hot reload, preloading, master-worker architecture. And we will see how these features work and how they are just Unix and not magic.

So let’s get started.

fork(2)

fork is how processes are created. Every process after the first one (with PID 1) was created with fork. So what is it, what is fork?

fork is a system call. Most of the time we can recognize system calls by the 2 behind their name (e.g. fork(2)) which means that we can find documentation about them in section 2 of the Unix manual, nowadays known as “man pages”. So in order to see the documentation for fork(2) you can run man 2 fork on your command line.

But what’s a system call? A way to communicate with the kernel of our operating system. System calls are the API of the kernel, if you will. We tell the kernel to do something for our us with system calls: reading, writing, allocating memory, networking, device management.

And fork is the system call that tells the kernel to create a new process. When one process asks the kernel for a new process with fork(2) the kernel splits the process making the call into two. That’s probably where the name comes from: calling fork(2) is a “fork in the road” in the lifetime of a process. As soon as the kernel returns control to the process after handling the system call there now is a parent process and a child process. A parent can have a lot of child processes, but a child process only one parent process.

And both processes, parent and child, are pretty much the same, right after the creation of the child. That’s because child processes in a Unix system inherit a lot of stuff from their parent processes: the data (the code it’s executing), the stack, the heap, the user id, the working directory, open file descriptors, the connected terminal and a lot more. This can be a burden (which is why copy-on-write is a thing) but also has some neat advantages — as we’ll see later.

So how do we use fork? Since (deep down) making a system call involves putting parameters and the unique identifier of the call in CPU registers (which ones may change depending on the architecture we’re working with) and firing a software interrupt, most programming languages provide wrappers that do all the work and allow us to not worry about which system call is identified by which number.

Ruby is no exception here and allows us to use fork(2) with a method called, well, fork:

# fork.rb

child_pid = fork do
  puts "[child] child_pid: #{child_pid}"
  puts "[child] Process ID: #{Process.pid}"
  puts "[child] Parent Process ID: #{Process.ppid}"
end

Process.wait(child_pid)

puts "[parent] child_pid: #{child_pid}"
puts "[parent] Process ID: #{Process.pid}"

What we’re doing here is calling fork in Ruby and pass it a block. This will create a new process, a child process, and run everything inside the block in the new process and then exit. In the parent process we call Process.wait and pass it the return value of fork, which is the ID of the child process. We also need to wait for child processes to exit because otherwise they’d turn into zombie processes. Yep, that’s a valid Unix rule right there: parent processes need to wait for their children to die so they don’t turn into zombies.

When we run this we’ll get this:

$ ruby fork.rb
[child] child_pid:
[child] Process ID: 29715
[child] Parent Process ID: 29695
[parent] child_pid: 29715
[parent] Process ID: 29695

As we can see, the child process has a new process ID and its parent process ID matches the process ID printed in the parent process. And most interestingly child_pid is nil inside the child process but contains a value in the parent process. This is how we can check whether we are in the parent process or the child process. Since the child inherits the data from the parent process, both processes are running the same code right after fork and we can decide which process does what depending on the return value of fork.

If we put a sleep somewhere inside the block, run it again and use a tool like ps or pstree we’d see something like this:

$ pstree | grep fork
 |   \-+= 29695 mrnugget ruby fork.rb
 |     \--- 29715 mrnugget ruby fork.rb

Two processes, one parent and one child, with different process IDs. Just by calling fork. That’s not too hard right? And it’s certainly not magic. So how does Unicorn use fork?

Unicorn and fork(2)

When Unicorn boots up it calls the spawn_missing_workers method, which contains this piece of code:

worker_nr = -1
until (worker_nr += 1) == @worker_processes
  WORKERS.value?(worker_nr) and next
  worker = Worker.new(worker_nr)
  before_fork.call(self, worker)
  if pid = fork
    WORKERS[pid] = worker
    worker.atfork_parent
  else
    after_fork_internal
    worker_loop(worker)
    exit
  end
end

So, what happens here? Unicorn calls this method with @worker_processes set to the number of workers we told it to boot up. It then goes into a loop and calls fork that many times. But instead of passing a block to fork, Unicorn instead checks the return value of fork so see if its now executing in the parent and in the child process. Remember: a forked process inherits the data of the parent process! A child process executes the same code as the parent, and we have to check for that in order to have the child do something else.

Passing a block to fork does the same thing under the hood, but explicitly checking the return-value of fork is quite a common idiom in many Unix programs, since the C API doesn’t allow passing blocks around.

If fork returned in the parent process, Unicorn saves the newly created worker object with PID of the newly created child process in the WORKERS hash constant, calls a callback and starts the loop again.

In the child process another callback is called and then the child goes into its main loop, the worker_loop. If the worker loop should somehow return the child process exits and is done.

And boom! We’ve now got 16 worker processes humming along, waiting for work in their worker_loop, just by going into a loop, doing some cleanup and calling fork 16 times.

That’s not too hard, is it? So let’s go from fork to another basic Unix feature…

Pipes!

My guess is that most people even vaguely familiar with Unix systems know about pipes and have probably done something like this at one point or another in their lives:

$ grep ‘wat’ journal.txt | wc -l
84

Pipes are amazing. Pipes are a really simple abstraction that allows us to take the output of one program and pass it as input to another program. Everybody loves pipes and I personally think the pipe character is one of the most best features Unix shells have to offer.

But did you know that you can use pipes outside of the shell?

pipe(2)

pipe(2) is a system call with which we can ask the kernel to create a pipe for us. This is exactly what shells are using. And we can use it too, without a shell!

Remember the saying that under Unix “everything is a file”? Well, pipes are files too. One pipe is nothing more than two file descriptors. A file descriptor is a number that points to an entry in the file table maintained by the kernel for each running process. In the case of pipes the two file table entries do not point to files on a disk, but rather to a memory buffer to which you can write and from which you can read with both ends of the pipe.

One of the file descriptors returned by pipe(2) is the read-end and the other one is the write-end. That’s because pipes are half duplex – the data only flows in one direction.

Outside of the shell pipes are heavily used for inter-process communication. One process writes to one end, and another process reads from the other end. How? Remember that a child process inherits a lot of stuff from its parent process? That includes file descriptors! And since pipes are just file descriptors, child processes inherit them. If we open a pipe with pipe(2) in a parent process and then call fork(2), both the parent and the child process have access to the same file descriptors of the pipe.

# pipe.rb

read_end, write_end = IO.pipe

fork do
  read_end.close

  write_end.write('Hello from your child!')
  write_end.close
end

write_end.close

Process.wait

message = read_end.read
read_end.close

puts "Received from child: '#{message}'"

In Ruby we can use IO.pipe, which is a wrapper around the pipe(2) system call, just like fork is a wrapper around fork(2), to create a pipe.

And in this example we create a pipe with IO.pipe and then create the child process with fork. Since just after the call to fork both processes have both pipe file descriptors we need to close the end of the pipe we’re not going to need. In the child process that’s the read-end and in the parent it’s the write-end.

We then write something to the pipe in the child, close the write-end and exit. The parent closes the write-end, waits for the child to exit and then reads the message the child wrote to the pipe. To clean up it closes the read-end. If we run this we get exactly what we expected:

$ ruby pipe.rb
Received from child: 'Hello from your child!'

That’s pretty amazing, isn’t it? Just a few lines of code and we created two processes that talk to each other! By the way, this is the exact same concept a shell uses to make the pipe-character work. It creates a pipe, it forks (once for each process on one side of the pipe) then uses another system call (dup2) to turn the write-end of the pipe into STDOUT and the read-end into STDIN respectively and then executes different programs which are now connected through a pipe.

So how does Unicorn make use of pipes?

Unicorn and pipe(2)

Unicorn uses pipes a lot.

First of all, there is a pipe between each worker process and the master process, with which they communicate. The master process writes command to the pipe (something like QUIT) and the child process then reads the commands and acts upon them. Communication between the master and its worker processes through pipes.

Then there’s another pipe the master process only uses internally and not for IPC, but for signal handling. It’s called the “self-pipe” and we’ll have a closer look at that one later.

And then there’s the ready_pipe Unicorn uses, which is actually quite an amazing trick. See, if you want to daemonize a process under Unix, you need to call fork(2) two times (and do some other things) so the process is completely detached from the controlling terminal and the shell thinks is the process is done and gives you a new prompt.

What Unicorn does when you tell it to run as a daemon is to create a pipe, called the ready_pipe. It then calls fork(2) two times, creating a grand child process. The grand child process inherited the pipe, of course, and as soon as its fully booted up and everything looks good, it writes to this pipe that it’s okay for the grand parent to quit. The grand parent, which waited for a message from the grand-child, reads this and then exits.

This allows Unicorn to wait for the grand child to boot up while still having a controlling terminal to which it can write error messages should something go wrong between the first call to fork(2) and booting up the HTTP server in the grand child. Only if the everything worked the grand child turns into a real daemon process. Process synchronization through pipes.

That does come pretty close to being magic, yep, but this is just a really clever use of fork(2) and pipe(2).

sockets & select(2)

At the heart of everything that has to do with networking under Unix are sockets. You want to read a website? You need to open a socket first. Send something to the logserver? Open a socket. Wait for incoming connections? Open a socket. Sockets are, simply put, endpoints between computers (or processes!) talking to each other.

There are a ton of different sockets: TCP sockets, UDP sockets, SCTP sockets, Unix domain sockets, raw sockets, datagram sockets, and so on. But there is one thing they all have in common: they are files. Yes, “everything is file” and that includes sockets. Just like a pipe, a socket is a file descriptor, from which you can read and write to just like with a file. The sockets API for reading and writing is deep down the same as the file API.

So, let’s say we are writing a server. How do we use sockets for that? The basic lifecycle of a server socket looks like this:

First we ask the kernel for a socket with the socket(2) system call. We specify the family of the socket (IPv4, IPv6, local), the type (stream, datagram) and the protocol (TCP, UDP, …). The kernel then returns a file descriptor, a number, which represents our socket.

Then we need to call bind(2), to bind our socket a network address and a port. After that we need to tell the kernel that our socket is a server socket, that will accept new connections, by calling listen(2). So now the kernel forwards incoming connections to us. (This is the main difference between the lifecycles of a server and a client socket).

Now that our socket is a real server socket and waiting for new incoming connections we can call accept(2), which accepts connections and returns a new socket. This new socket represents the connection. We can read from it and write to it.

But here’s the thing: accept(2) is a blocking call. It only returns if the kernel has a new connection for us. A server that doesn’t have too many incoming connections will be blocking for a long time on accept(2). This makes it really difficult to work with multiple sockets. How are you going to accept a connection on one socket if you’re still blocking on another socket that nobody wants to connect to?

This is where select(2) comes into play.

select(2) is a pretty old and famous (maybe infamous) Unix system call for working with file descriptors. It allows us to do multiplexing: we can monitor several file descriptors with select(2) and let the kernel notify us as soon as one of them has changed its state. And since sockets are file descriptors too, we can use select(2) to work with multiple sockets. Like this:

sock1 = Socket.new(:INET, :STREAM)
addr1 = Socket.pack_sockaddr_in(8888, '0.0.0.0')
sock1.bind(addr1)
sock1.listen(10)

sock2 = Socket.new(:INET, :STREAM)
addr2 = Socket.pack_sockaddr_in(9999, '0.0.0.0')
sock2.bind(addr2)
sock2.listen(10)

5.times do
  fork do
    loop do
      readable, _, _ = IO.select([sock1, sock2])

      connection, _ = readable.first.accept
      puts "[#{Process.pid}] #{connection.read}"
      connection.close
    end
  end
end

Process.wait

That’s a 23-line TCP server, listening on two ports, with 5 worker processes accepting connections. Besides missing some minor things like HTTP request parsing, HTTP response writing and error handling it’s pretty much ready to ship.

No, but seriously, this actually does a lot of stuff in just a few lines with the help of system calls.

We create two sockets with Socket.new, which somewhere deep down in Ruby calls socket(2). Then we bind the sockets to two different ports, 8888 and 9999 respectively, on the local interface. Afterwards we call listen(2) (hidden by the #listen method) and tell the kernel to queue up 10 connections at maximum for us to handle.

With our sockets ready to go we call fork 5 times, which in turn creates 5 child processes that all run the code in the block. So every child calls IO.select (which is the wrapper around select(2)) with the two sockets as argument. IO.select is going to block and only return if one of the two sockets is readable (on a listening socket that means that there are new connections). And this is exactly why we use select(2) here: with accept(2) we would block on one socket and miss out if the other socket had a new connection.

IO.select returns the readable sockets in an array. We take the first one and call accept(2) on it, which is now going to return immediately. Then we just read from the connection, close the connection socket and start our worker loop again.

If we run this and send some messages to our server with netcat like this:

$ echo 'foobar1' | nc localhost 9999
$ echo 'foobar2' | nc localhost 9999
$ echo 'foobar3' | nc localhost 8888
$ echo 'foobar4' | nc localhost 8888
$ echo 'foobar5' | nc localhost 9999

Then we can see our server accepting the connections and reading from them:

$ ruby tcp_sockets_example.rb
[31605] foobar1
[31607] foobar2
[31605] foobar3
[31607] foobar4
[31609] foobar5

Each connection handled by a different child process. Load balancing done by the kernel for us, thanks to select(2).

Unicorn, sockets and select

Before master process calls fork to create the worker processes, it calls socket, bind and listen to create one or more listening sockets (yes, you can configure Unicorn to listen on multiple ports!). It also creates the pipes that will be used to communicate with the worker processes.

After forking, the workers, of course, have inherited both the pipe and the listening sockets. Because, after all, sockets and pipes are file descriptors.

The workers then call select(2) as part of their worker_loop with both the pipe and the sockets as arguments. Now, whenever a connection comes in, one of the workers’ call to select(2) returns and this worker handles the connection by reading the request and passing it to the Rack/Rails application.

And here’s the thing: since the workers call select(2) not only with the sockets, but also with the master-to-worker pipe, they’ll never miss a message from the master while waiting for a new connection. And if there is a new connection, they handle it, close it and then read the message from the master process.

That’s a really neat way to do load balancing through the kernel and to guarantee that messages to workers are not lost or delayed too long while the worker process is doing its work.

Signals

Let’s talk about signals. Signals are another way to do IPC under Unix. We can send signals to processes and we can receive them.

$ kill -9 8433

This sends the signal 9, which is the KILL signal, to process 8433. That’s pretty well-known and a lot of people have used this before (probably with sweat running down their face). But did you know that pressing Ctrl-C and Ctrl-Z in your shell sends signals too?

So what are signals? Most often they are described as software interrupts. If we send a signal to the process, the kernel delivers it for us and makes the process jump to the code that deals with receiving this signal, effectively interrupting the current code flow of the process. Signals are asynchronous — we don’t have to block somewhere to send or receive a signal. And there are a lot of them: the current Linux kernel for example supports around 30 different signals.

Sending signals is pretty good, and I’d bet we’ve all done it a bunch of times, but what’s really cool is this: we can tell the kernel how we want our process to react to certain signals. That’s called “signal handling”.

We have a few options when it comes to signal handling. We can ignore signals: we can tell the kernel we don’t care about a signal and when the kernel delivers an ignored signal to our process it doesn’t jump to any specific code, but instead does nothing. Ignoring signals has one limitation though: we can’t ignore SIGKILL and SIGSTOP, since there has to be a way for an administrator to kill and stop a process, no matter what the developer of that process wants it to do.

The second option is to catch a signal, effectively defining a signal handler. If ignoring a signals means “Nope, kernel, don’t care about QUIT.” then defining a signal action is telling the kernel “Hey, if I receive this signal, please execute this piece of code here”. For example: a lot of Unix programs do some clean-up work (remove temp files, write to a log, kill child processes) when receiving SIGQUIT. That’s done by catching the signal and defining an appropriate signal handler, that does the clean-up work. Catching signals has the limitations that ignoring signals has: we can’t catch SIGKILL and SIGSTOP.

We can also let the defaults apply. Each signal has a default action associated with it. E.g. the default action for SIGQUIT is to terminate the process and make a core dump. We can let that one leave it as it is, or redefine the signal action by catching it. See man 3 signal on OS X or man 7 signal on Linux for a list of the default actions associated with each signal.

So, how do we catch a signal? In Ruby it’s pretty simple:

# signals.rb

trap(:SIGUSR1) do
  puts "SIGUSR1 received"
end

trap(:SIGQUIT) do
  puts "SIGQUIT received"
end

trap(:SIGKILL) do
  puts "You won't see this"
end

puts "My PID is #{Process.pid}. Send me some signals!"

sleep 100

We use trap to catch a signal and pass it a block to define a signal action that will be executed as soon as our process receives the signal. In this example, we try to redefine the signal handler for SIGUSR1, SIGQUIT and SIGKILL. The sleep statement gives us time to send the signals to our process.

If we run this and then send signals to our process with the kill command like this:

$ kill -USR1 31950
$ kill -QUIT 31950
$ kill -KILL 31950

Then our process will output the following:

$ ruby signals.rb
My PID is 31950. Send me some signals!
SIGUSR1 received
SIGQUIT received
zsh: killed     ruby signals.rb

As we can see, the kernel delivered all of the signals to our process. On receiving SIGUSR1 and SIGQUIT it executed the signal handlers, but, as I said before, catching SIGKILL proved useless and the kernel killed the process.

You can probably imagine what we can do with signal handlers. One of the most common things to do with custom signal handlers, for example, is to catch SIGQUIT to do some clean-up work before exiting. But there are a lot more signals and defining appropriate signal handlers can distinguish well-behaving processes from rude ones. Example: if a child process dies the kernel notifies the parent process by sending a SIGCHLD. The default action is to ignore the signal and do nothing, but a well-behaving application would probably wait for the child, clean up after him and write something to a log file.

Unicorn and signals

Unicorn sets up a lot of different signal handlers in the master process, before it calls fork and spawns the worker processes. These signal handlers do a lot of things. Here are a few examples:

QUIT — Graceful shutdown. The master process waits for the workers to finish their work (the current request), cleans up and only then exits.
TERM and INT — Immediate shutdown. Workers don’t finish their work.
USR1 — Reopen the log files. This is mostly used and sent by a logration daemon.
USR2 — Hot-Reload. Start up a new master process with a new version of the application and keep the old master running.
TTIN/TTOU — Increase/decrease the number of worker processes.
HUP — Reload the configuration file while running.
WINCH — Keep the master process running, but gracefully stop the workers.

These signal handlers are like a separate API through which you tell the master and worker processes what to do. And it’s pretty reliable too, considering the fact that signals are essentially asynchronous events and can be sent multiple times. This just screams for race-conditions and locks. So how does Unicorn do it?

Unicorn uses a self-pipe to manage its signal actions. The pipe the master process sets up is this self-pipe, which it will only use internally and not to talk to other processes. It also sets up a queue data structure. After that come the signal handlers. Unicorn catches a lot of signals, as we saw, but each signal handler doesn’t do much. It only pushes the signal’s name into the queue and sends one byte through the self-pipe.

After setting up the signal handlers, spawning worker processes, and so on, the master process goes into its main loop, in which it checks upon the workers regularly and sleeps in between. But it doesn’t just sleep, no, the master process actually goes to sleep by calling select(2) on the self-pipe, with a timeout as argument. This way it can go to sleep but will be woken up as soon as a signal arrived, since the signal handler just send a byte through the pipe, turns it into a readable pipe (from the master’s perspective) and select(2) now returns. After waking up, the master just has to pop off a signal from the queue it set up in the beginning and handle the signals one after another. This is of tremendous value if you consider again that signals are asynchronous and you’ll never know what you’re currently executing when a signal arrives, and that they can be sent multiple times — even if you’re currently executing your signal handler code. Using a queue and a self-pipe in this combination makes handling signals a lot saner and easier.

Worker processes, on the other hand, inherit the master’s signal handlers – again: child processes inherit a lot from their parents. But instead of leaving them as they are, the workers redefine (most of) the signal handlers to be no-ops. They get their signals through the pipe which connects them to the master process. If the master process, for example, receives SIGQUIT it writes the name of the signal to each pipe connected to a worker process to gracefully shut them down. The worker processes call select(2) on this master-worker pipe and the listening sockets, which means that as soon as they finish their work (or don’t have anything to do) they will read the signal name from the pipe and act upon it. This “signal delivery from master to worker via pipe”-mechanism avoids the many problems that can occur if a worker process should receive a signal while currently working of a request.

Magic?

By now we have looked at fork(2) and how easy it is to spawn a new process. We saw that we can use pipes pretty easily outside a shell and without any use of the pipe character by calling pipe(2) and just working with the two file descriptors as if they were files. We also created sockets, worked with select(2), looked at a pre-forking TCP server in 23 lines of Ruby and had the kernel of our operating system do our load balancing for us. Then we saw that Unicorn has its own API composed of signals and that it’s not that hard to work with signals.

These were just some basic Unix concepts. Trivial on their own, powerful when combined.

So, let’s have a closer look at these features of Unicorn that amazed me so much, that I was sure were created by some wizards with long robes and tall hats, in a basement far, far away, on old rusty PDP-11s.

Let’s see how this “magic” is just Unix.

Preloading

If we put preload = true in the configuration file, Unicorn will “preload” our Rack/Rails application in the master process to spare the worker process from doing it themselves. As soon as the application is preloaded, spawning off a new worker process is really, really fast, since the workers don’t have to load it anymore.

The question is: how does this work exactly? Let me explain.

Right after Unicorn has evaluated command line options, it builds a lambda called app. This lambda contains the instructions needed to load our Rack/Rails application into memory. It loads the config.ru file (or uses default settings) and then creates a Rack application with Rack::Builder, on which it calls #to_app.

So what should come out of the lambda is a Rack application in which we just need to call #call to pass it a request and get a response. But since lambdas are evaluated only as soon as they are called, this doesn’t happen when the lambda is defined.

Unicorn passes this app lambda on to the Unicorn::HttpServer, which eventually calls fork(2) to spawn the worker processes. But before it creates a new process, the HttpServer checks if we told Unicorn to use preloading. If we did, only then it calls the lambda. If we didn’t, the workers would each call the lambda after the call to fork(2).

Calling the lambda, which hasn’t been called before, now loads our application into memory. Files are being read, objects are created, connections established – everything is somehow getting stored in memory.

And here comes the real trick: since the master loaded the application into memory, which can take some time if we’re working with a large Rails application, the worker processes inherit it. Yep, the worker processes inherit our application. How neat is that? Since workers are created with fork(2) they already have the whole application in memory as soon as they are created. Preloading is just deciding if the Unicorn calls a lambda before or after the call to fork(2). And if Unicorn called it before, creating new worker processes is really fast, since they are basically ready to go right after creation, except for some callbacks and setup work.

With copy-on-write, which works in the Ruby VM since 2.x, this is even faster. The reason is that “inheriting” involves copying from the parent’s to the child’s memory address space. It’s probably not as slow as you imagine, but with copy-on-write only the memory regions which the child process wants to modify are copied.

And the best part of it is this: the kernel is doing all the work for us. The kernel answers the call to fork(2) and the kernel copies the memory. We just need to decide when to create our objects: before or after the call to fork(2).

This comes in really handy when we now look at another great feature of Unicorn.

Scaling workers with signals

Unicorn allows us to increase and decrease the number of its worker processes by sending two signals to the master process:

$ kill -TTIN 93821
$ kill -TTOU 93821

These two lines add and then remove a new worker process. The signals used, SIGTTIN and SIGTTOU, are normally sent by our terminal driver to notify a process running in the background when it’s trying to read from (SIGTTIN) or write to (SIGTTOU) the controlling terminal. Since Unicorn doesn’t allow not using a logfile when running as a daemon, this shouldn’t be an issue, which means that Unicorn is free to redefine the signal actions (the default for both signals is to stop the process).

It does so by defining signal handlers for SIGTTIN and SIGTTOU that, as we saw, only add the name of the signal to the signal queue and write a byte to the self-pipe to wake up the master process.

The master process, as soon as it wakes up from its main-loop sleep, sees the signals and increases or decreases the internal variable worker_processes, which is just an integer. And right before it goes back to sleep, it calls #maintain_worker_count, which either spawns a new worker or writes SIGQUIT to the pipe connected to the now superfluous worker process to gracefully shut it down.

So let’s say we send SIGTTIN to Unicorn to increase the number of workers. What will happen is that the master wakes up (triggered by the write to the self-pipe), increases worker_processes and calls #maintain_worker_count, which in turn will call another method called #spawn_missing_workers. Yes, that’s right. We looked at this method before, its the same one that’s used to spawn the worker processes when booting up. In its entirety it looks like this:

def spawn_missing_workers
  worker_nr = -1
  until (worker_nr += 1) == @worker_processes
    WORKERS.value?(worker_nr) and next
    worker = Worker.new(worker_nr)
    before_fork.call(self, worker)
    if pid = fork
      WORKERS[pid] = worker
      worker.atfork_parent
    else
      after_fork_internal
      worker_loop(worker)
      exit
    end
  end
  rescue => e
    @logger.error(e) rescue nil
    exit!
end

Again, this is just a loop that calls fork(2) N times. Now that N is increased by one, a new worker process will be created. The other calls to fork are skipped by checking whether WORKERS already contains an instance of Worker with the same worker_nr.

Take note of worker_nr here, it is important. All worker processes have a worker_nr by which they are easily identified in the row of spawned processes.

If we now send SIGTTOU to the master process, the following is going to happen. First of all, the master is woken up by a fresh byte on the self-pipe. Instead of increasing worker_processes now, it decreases it. And again, it calls #maintain_worker_count, which doesn’t jump straight to #spawn_missing_workers. Since no worker process is missing, #maintain_worker_count now takes care of reducing the number of workers:

def maintain_worker_count
  (off = WORKERS.size - worker_processes) == 0 and return
  off < 0 and return spawn_missing_workers
  WORKERS.each_value { |w| w.nr >= worker_processes and w.soft_kill(:QUIT) }
end

It may not be idiomatic Ruby, but these 3 lines are still fairly easy to understand. The first line generates the difference between the number of currently running worker processes and returns if it’s zero. If the difference is negative, a new worker will be spawned (which is where the path of SIGTTIN ends in this method). But since the difference is positive after decreasing worker_processes, the master process now takes the workers with a worker_nr that’s too high and calls soft_kill(:QUIT) on the worker instance.

This in turn sends the signal name through the pipe to the corresponding worker process, which will catch that signal through select(2) and gracefully shut down.

After this, the master process calls Process.waitpid (which in turn calls waitpid(2)), which returns the PID of dead children (and doesn’t leave them hanging as zombies). The worker process with this PID now just needs to be removed from the WORKERS hash and Unicorn is ready to go again.

All of this is pretty simple: fork(2) in a loop, pipes, signal handlers and keeping track of numbers. Again: it’s the combination of that makes these Unix idioms so powerful.

The same can be said for my favorite Unicorn feature.

Hot Reload

This fantastic feature has many names: hot reload, zero downtime deployment, hot swapping and hot deployment. It allows us to deploy a new version of our application, while the old one is still running.

With Unicorn “hot reload” means, that we can spin up a new master process, with new worker processes serving a new version of our application, while the old master process is still running and still handling requests with the old version.

It’s all triggered by sending a simple SIGUSR2 to the master process. But how?

Let’s take a step back and say that our Unicorn master and worker processes are just humming along. The master process is sleeping, waking up, checking up on the workers and going back to sleep. The worker processes are handling requests without a care in the world. Suddenly a SIGUSR2 is sent to the master process.

Again, the signal handler catches the signal, pushes the signal onto the signal queue, writes a byte to the self-pipe and returns. The master wakes up from its main-loop-slumber and sees that it received SIGUSR2. Straight away it calls the #reexec method. It’s a fairly long method and you don’t have to read through it now. But most of “hot reload” is contained in it, so let’s walk through it.

The first thing the method does it to check if the master process is already reexecuting (reexecuting means that a new master process is started by an old one). If it is, it returns and its job is done. But if not, it writes the current PID to /path/to/pidfile.pid.oldbin. .oldbin stands for “old binary”. With the PID saved to a file, the master process now calls fork(2), saves the returned PID of the newly created child process (to later check if it’s already reexecuting…) and returns. The old master process adds “(old)” to its process name (by changing $0 in Ruby) and is now done with #reexec. But since a process created with fork(2) is executing exactly the same code, the new child process goes ahead with #reexec.

Right after the call to fork(2) the child writes the numbers of the sockets it’s listening on (remember: sockets are files, files are represented as file descriptors, which are just numbers) to an environment variable called UNICORN_FD as one string, in which the numbers are separated by commas. (Yes, it keeps track of listening sockets by writing to an environment variable. Take a deep breath. It’ll make sense in a second.)

Afterwards it modifies the listening sockets so they stay open by setting the FD_CLOEXEC flag on them to false.

It then closes all the other file descriptors it doesn’t need (e.g.: sockets and files opened by the Rack/Rails application).

With all preparations and cleaning done, the child process now calls execve(2).

The execve(2) system call turns the calling process into a completely different program. Which program it’s turned into is determined by the arguments passed to execve(2): the path of the program, the arguments and environment variables. This is not a new process we’re talking about: the new program has the same process ID, but its complete heap, stack, text and data segments are replaced by the kernel.

This is how we can spawn new programs on a Unix system and what every Unix shell does when we try to launch Vim: it calls fork(2) to create a child process and then it calls execve(2) with the path to the Vim executable. Without the call to execve(2) we’d end up with a lot of copies of the original shell process when trying to start programs.

That’s also why Unicorn needs to set the FD_CLOEXEC flag to false on the sockets before it calls execve(2). Otherwise the sockets would get closed, when the of the process is being replaced.

Unicorn calls execve(2) with the original command line arguments it was started with (it keeps track of them), in effect spawning a fresh Unicorn master process that’s going to serve a new version of our application. Except that it’s not completely fresh: the environment variables the old master process set (UNICORN_FD) are still accessible by the new master process.

So the new master process boots up and loads the new application code into memory (preloading!). But before it creates worker processes with fork(2), it checks the UNICORN_FD environment variable. And it finds the numbers of our listening sockets! And since file descriptors are just numbers, it can work with them. It turns them into Ruby IO objects by calling IO.new with each number as an argument and has thereby recovered its listening sockets.

And now it calls fork(2) and creates worker processes which inherit these listening sockets again and can start their select(2) and accept(2) dance again, now handling requests with the new version of our application.

There is no “address already in use” error bubbling up. The new master process inherited these sockets, they are already bound to an address and transformed into listening sockets by the old master process. The new master process and its workers can work with them in the same way the worker processes of the old master process do.

Now there are two sets of master and worker processes running. Both are handling incoming connections on the same sockets.

We can now send SIGQUIT to the old master process to shut it down and as soon as it exits the new master process takes over and only our new application version is being served. And all of this happened without the old worker processes stopping their work once.

It’s just Unix

All of this is just Unix. The master-worker architecture, the signal handling, the communication through pipes, the preloading, the scaling of workers with signals and the hot reloading of Unicorn. There is no magic involved.

I think that’s the most amazing part about all of this. The combination of concepts like fork, pipe and signals, that are easy to understand on their own, and leveraging the operating system is where the perceived magic and ultimately the power of great Unix software like Unicorn comes from.

Why?

You might be thinking: “Why? Why should I care about this low-level stuff? I build web applications, why should I care about fork and select?

I think there are some really compelling reasons.

The first one is debugging. Have you ever wondered why you shouldn’t open a database connection (a socket!) before Unicorn calls fork(2)? Or why you get a “too many open files” error when you try to make a HTTP request (sockets!)? Now you know why.

Knowing how your system works on each layer of the stack is immensely helpful when trying to find and eliminate bugs.

The next reason I call the design and architecture reason and boils down to having answers to questions like these: should we use threads or processes? How could these processes talk to each other? What are the limitations? What are the benefits? Will this perform? What’s the alternative?

With some understanding of your operating system and the APIs it offers, it’s far easier to make architectural decisions and design choices when building a system or single components of it.

One more level of abstraction. Someone somewhere at some time said that “it’s always good to know one more level of abstraction beneath the one you’re currently working on” and I totally agree.

I like to think, that learning C made me a better Ruby programmer. I suddenly knew what was happening behind the curtains of the Ruby VM. And if I didn’t know, I could make a good guess.

And I think that knowing deeply about the system to which I deploy my (web) application makes me a better developer, for the same reasons.

But the most important reason for me, which is a personal one, is the realization that everything Unicorn does is not magic! No, it’s just Unix and there is no secret ingredient. Which, in turn, means that I could write software like this. I could write a webserver like this! Realizing this is worth a lot.

Follow me on twitter: @thorstenball. Or send me an email to me@thorstenball.com. Or check out my books at interpreterbook.com and compilerbook.com.

I also write a weekly newsletter called Register Spill. Read it and sign up at registerspill.thorstenball.com.