Unicorn Unix Magic Tricks
This post is based on the talk of the same name I gave at the Arrrrcamp conference in Ghent, Belgium on October 2nd, 2014. You can find the slides here and the video recording here.
Unicorn is a webserver written in Ruby for Rails and Rack applications. When I first used it I was amazed. This is magic, I thought. It had to be. Why?
Well, first of all: the master-worker architecture. Unicorn uses one master
process to manage a lot of worker processes. When you tell Unicorn to use 16
worker processes it does so, just like that. And now you’re looking at 17
processes when you run ps aux | grep unicorn
— each with a different name,
showing whether its the master process or one of the worker processes, which
even have their own number in their process names.
$ pstree | grep unicorn
\-+= 27185 mrnugget unicorn master -c simple_unicorn_config.rb -l0.0.0.0:8080
|--- 27210 mrnugget unicorn worker[0] -c simple_unicorn_config.rb -l0.0.0.0:8080
|--- 27211 mrnugget unicorn worker[1] -c simple_unicorn_config.rb -l0.0.0.0:8080
|--- 27212 mrnugget unicorn worker[2] -c simple_unicorn_config.rb -l0.0.0.0:8080
|--- 27213 mrnugget unicorn worker[3] -c simple_unicorn_config.rb -l0.0.0.0:8080
|--- 27214 mrnugget unicorn worker[4] -c simple_unicorn_config.rb -l0.0.0.0:8080
|--- 27215 mrnugget unicorn worker[5] -c simple_unicorn_config.rb -l0.0.0.0:8080
|--- 27216 mrnugget unicorn worker[6] -c simple_unicorn_config.rb -l0.0.0.0:8080
|--- 27217 mrnugget unicorn worker[7] -c simple_unicorn_config.rb -l0.0.0.0:8080
|--- 27218 mrnugget unicorn worker[8] -c simple_unicorn_config.rb -l0.0.0.0:8080
|--- 27219 mrnugget unicorn worker[9] -c simple_unicorn_config.rb -l0.0.0.0:8080
|--- 27220 mrnugget unicorn worker[10] -c simple_unicorn_config.rb -l0.0.0.0:8080
|--- 27221 mrnugget unicorn worker[11] -c simple_unicorn_config.rb -l0.0.0.0:8080
|--- 27222 mrnugget unicorn worker[12] -c simple_unicorn_config.rb -l0.0.0.0:8080
|--- 27223 mrnugget unicorn worker[13] -c simple_unicorn_config.rb -l0.0.0.0:8080
|--- 27224 mrnugget unicorn worker[14] -c simple_unicorn_config.rb -l0.0.0.0:8080
\--- 27225 mrnugget unicorn worker[15] -c simple_unicorn_config.rb -l0.0.0.0:8080
How would one build something like this? I had no idea.
And then there’s a feature called “hot reload”, which means that you can tell
Unicorn, while it’s running, to spin up a new version of your application. As
soon as you do, Unicorn starts a new master process, which is going to serve
the new version of your application. All the while the old master process is
still running, responding to requests with your old application. Of course, the
old master now has “old” in its name. Now, as soon as the new master process is
fully booted up, you can send a QUIT
signal to the old master process, which
will in turn shut down and let the new one take over. And just like that you’ve
switched to a new version of your application — without any downtime at all.
Oh, and Unicorn uses a lot more than the QUIT
signal! There are tons of
signals you can send to it: TTIN
to increase the number of workers, TTOU
to
decrease it, USR1
to rotate the log files, USR2
to perform hot reloading,
HUP
to re-evaluate the configuration file. I didn’t know half of these signal
names and there were even more in Unicorn’s own SIGNALS
file.
And then there’s “preloading”: a feature of Unicorn that allows you to spin up
new worker processes in less than a second, a fraction of the time it takes to
boot up my Rails application. Somehow Unicorn is able to preload my
application in memory and make use of that when creating new worker processes.
And I had no idea how that works! Not a clue! And as if that wasn’t enough I
discovered that Unicorn even has a file called PHILOSOPHY
in its repository.
Who else has that?! I was sure that there was some black magic going on.
Because: how could Unicorn work like it does without magic?
Unix
After my first encounter with Unicorn I learned quite a bit about Unix systems and after a while I came back to Unicorn — still in amazement. But this time I read through the source code and it turns out, that, well, the secret ingredient to Unicorn is not magic but plain, old Unix.
Now, most people know Unix from a “user’s perspective”: the command line,
shells, pipes, redirection, the kill
command, scripting, text files and so
on. But there’s this whole other side of Unix, too, which we could call the
“developer’s perspective” now. From this side of Unix you can see signal
handling, inter-process communication, usage of pipes without the
|
-character, system calls and whole lot more.
In what follows we’re going to have a look at Unicorn. We’ll take it apart and see that it’s just using some basic Unix tricks, the ones you can use as a developer, to do its work. The way we’re going to do that is by going through some of these Unix tricks, basic building blocks of every Unix system, and see how they work and how Unicorn uses them.
At the end we’ll go back to the “magic” of the beginning: hot reload, preloading, master-worker architecture. And we will see how these features work and how they are just Unix and not magic.
So let’s get started.
fork(2)
fork is how processes are created. Every process after the first one (with PID 1) was created with fork. So what is it, what is fork?
fork is a system call. Most of the time we can recognize system calls by the
2 behind their name (e.g. fork(2)
) which means that we can find
documentation about them in section 2 of the Unix manual, nowadays known as
“man pages”. So in order to see the documentation for fork(2)
you can run
man 2 fork
on your command line.
But what’s a system call? A way to communicate with the kernel of our operating system. System calls are the API of the kernel, if you will. We tell the kernel to do something for our us with system calls: reading, writing, allocating memory, networking, device management.
And fork is the system call that tells the kernel to create a new process. When
one process asks the kernel for a new process with fork(2)
the kernel splits
the process making the call into two. That’s probably where the name comes
from: calling fork(2)
is a “fork in the road” in the lifetime of a process.
As soon as the kernel returns control to the process after handling the system
call there now is a parent process and a child process. A parent can have a lot of
child processes, but a child process only one parent process.
And both processes, parent and child, are pretty much the same, right after the creation of the child. That’s because child processes in a Unix system inherit a lot of stuff from their parent processes: the data (the code it’s executing), the stack, the heap, the user id, the working directory, open file descriptors, the connected terminal and a lot more. This can be a burden (which is why copy-on-write is a thing) but also has some neat advantages — as we’ll see later.
So how do we use fork? Since (deep down) making a system call involves putting parameters and the unique identifier of the call in CPU registers (which ones may change depending on the architecture we’re working with) and firing a software interrupt, most programming languages provide wrappers that do all the work and allow us to not worry about which system call is identified by which number.
Ruby is no exception here and allows us to use fork(2)
with a method called,
well, fork
:
# fork.rb
child_pid = fork do
puts "[child] child_pid: #{child_pid}"
puts "[child] Process ID: #{Process.pid}"
puts "[child] Parent Process ID: #{Process.ppid}"
end
Process.wait(child_pid)
puts "[parent] child_pid: #{child_pid}"
puts "[parent] Process ID: #{Process.pid}"
What we’re doing here is calling fork
in Ruby and pass it a block. This will
create a new process, a child process, and run everything inside the block in
the new process and then exit. In the parent process we call Process.wait
and
pass it the return value of fork
, which is the ID of the child process. We
also need to wait for child processes to exit because otherwise they’d turn
into zombie processes. Yep, that’s a valid Unix rule right there: parent
processes need to wait for their children to die so they don’t turn into
zombies.
When we run this we’ll get this:
$ ruby fork.rb
[child] child_pid:
[child] Process ID: 29715
[child] Parent Process ID: 29695
[parent] child_pid: 29715
[parent] Process ID: 29695
As we can see, the child process has a new process ID and its parent process ID
matches the process ID printed in the parent process. And most interestingly
child_pid
is nil
inside the child process but contains a value in the
parent process. This is how we can check whether we are in the parent process
or the child process. Since the child inherits the data from the parent process,
both processes are running the same code right after fork
and we can decide
which process does what depending on the return value of fork
.
If we put a sleep
somewhere inside the block, run it again and use a tool like
ps
or pstree
we’d see something like this:
$ pstree | grep fork
| \-+= 29695 mrnugget ruby fork.rb
| \--- 29715 mrnugget ruby fork.rb
Two processes, one parent and one child, with different process IDs. Just by
calling fork
. That’s not too hard right? And it’s certainly not magic. So how
does Unicorn use fork
?
Unicorn and fork(2)
When Unicorn boots up it calls the
spawn_missing_workers
method, which contains
this piece of code:
worker_nr = -1
until (worker_nr += 1) == @worker_processes
WORKERS.value?(worker_nr) and next
worker = Worker.new(worker_nr)
before_fork.call(self, worker)
if pid = fork
WORKERS[pid] = worker
worker.atfork_parent
else
after_fork_internal
worker_loop(worker)
exit
end
end
So, what happens here? Unicorn calls this method with @worker_processes
set
to the number of workers we told it to boot up. It then goes into a loop and
calls fork
that many times. But instead of passing a block to fork
, Unicorn
instead checks the return value of fork
so see if its now executing in the
parent and in the child process. Remember: a forked process inherits the data
of the parent process! A child process executes the same code as the parent,
and we have to check for that in order to have the child do something else.
Passing a block to fork
does the same thing under the hood, but explicitly checking
the return-value of fork
is quite a common idiom in many Unix programs, since
the C API doesn’t allow passing blocks around.
If fork returned in the parent process, Unicorn saves the newly created
worker
object with PID of the newly created child process in the WORKERS
hash constant, calls a callback and starts the loop again.
In the child process another callback is called and then the child goes into its
main loop, the worker_loop
. If the worker loop should somehow return the child
process exits and is done.
And boom! We’ve now got 16 worker processes humming along, waiting for work in
their worker_loop
, just by going into a loop, doing some cleanup and calling
fork
16 times.
That’s not too hard, is it? So let’s go from fork
to another basic Unix
feature…
Pipes!
My guess is that most people even vaguely familiar with Unix systems know about pipes and have probably done something like this at one point or another in their lives:
$ grep ‘wat’ journal.txt | wc -l
84
Pipes are amazing. Pipes are a really simple abstraction that allows us to take the output of one program and pass it as input to another program. Everybody loves pipes and I personally think the pipe character is one of the most best features Unix shells have to offer.
But did you know that you can use pipes outside of the shell?
pipe(2)
pipe(2)
is a system call with which we can ask the kernel to create a pipe
for us. This is exactly what shells are using. And we can use it too, without a
shell!
Remember the saying that under Unix “everything is a file”? Well, pipes are files too. One pipe is nothing more than two file descriptors. A file descriptor is a number that points to an entry in the file table maintained by the kernel for each running process. In the case of pipes the two file table entries do not point to files on a disk, but rather to a memory buffer to which you can write and from which you can read with both ends of the pipe.
One of the file descriptors returned by pipe(2)
is the read-end and the other
one is the write-end. That’s because pipes are half duplex – the data only flows
in one direction.
Outside of the shell pipes are heavily used for inter-process
communication. One
process writes to one end, and another process reads from the other end. How?
Remember that a child process inherits a lot of stuff from its parent process?
That includes file descriptors! And since pipes are just file descriptors,
child processes inherit them. If we open a pipe with pipe(2)
in a parent
process and then call fork(2)
, both the parent and the child process have
access to the same file descriptors of the pipe.
# pipe.rb
read_end, write_end = IO.pipe
fork do
read_end.close
write_end.write('Hello from your child!')
write_end.close
end
write_end.close
Process.wait
message = read_end.read
read_end.close
puts "Received from child: '#{message}'"
In Ruby we can use IO.pipe
, which is a wrapper around the pipe(2)
system call,
just like fork
is a wrapper around fork(2)
, to create a pipe.
And in this example we create a pipe with IO.pipe
and then create the child
process with fork
. Since just after the call to fork
both processes have
both pipe file descriptors we need to close the end of the pipe we’re not going
to need. In the child process that’s the read-end and in the parent it’s the
write-end.
We then write something to the pipe in the child, close the write-end and exit. The parent closes the write-end, waits for the child to exit and then reads the message the child wrote to the pipe. To clean up it closes the read-end. If we run this we get exactly what we expected:
$ ruby pipe.rb
Received from child: 'Hello from your child!'
That’s pretty amazing, isn’t it? Just a few lines of code and we created two
processes that talk to each other! By the way, this is the exact same concept a
shell uses to make the pipe-character work. It creates a pipe, it forks (once
for each process on one side of the pipe) then uses another system call
(dup2
) to turn the write-end of the pipe into STDOUT and the read-end into
STDIN respectively and then executes different programs which are now connected
through a pipe.
So how does Unicorn make use of pipes?
Unicorn and pipe(2)
Unicorn uses pipes a lot.
First of all, there is a pipe between each worker process and the master
process, with which they communicate. The master process writes command to the
pipe (something like QUIT
) and the child process then reads the commands and
acts upon them. Communication between the master and its worker processes
through pipes.
Then there’s another pipe the master process only uses internally and not for IPC, but for signal handling. It’s called the “self-pipe” and we’ll have a closer look at that one later.
And then there’s the ready_pipe
Unicorn uses, which is actually quite an
amazing trick. See, if you want to daemonize a process under Unix, you need to
call fork(2)
two times (and do some other things) so the process is
completely detached from the controlling terminal and the shell thinks is the
process is done and gives you a new prompt.
What Unicorn does when you tell it to run as a daemon is to create a pipe,
called the ready_pipe
. It then calls fork(2)
two times, creating a grand
child process. The grand child process inherited the pipe, of course, and as
soon as its fully booted up and everything looks good, it writes to this pipe
that it’s okay for the grand parent to quit. The grand parent, which waited for
a message from the grand-child, reads this and then exits.
This allows Unicorn to wait for the grand child to boot up while still having a
controlling terminal to which it can write error messages should something go
wrong between the first call to fork(2)
and booting up the HTTP server in the
grand child. Only if the everything worked the grand child turns into a real
daemon process. Process synchronization through pipes.
That does come pretty close to being magic, yep, but this is just a really
clever use of fork(2)
and pipe(2)
.
sockets & select(2)
At the heart of everything that has to do with networking under Unix are sockets. You want to read a website? You need to open a socket first. Send something to the logserver? Open a socket. Wait for incoming connections? Open a socket. Sockets are, simply put, endpoints between computers (or processes!) talking to each other.
There are a ton of different sockets: TCP sockets, UDP sockets, SCTP sockets, Unix domain sockets, raw sockets, datagram sockets, and so on. But there is one thing they all have in common: they are files. Yes, “everything is file” and that includes sockets. Just like a pipe, a socket is a file descriptor, from which you can read and write to just like with a file. The sockets API for reading and writing is deep down the same as the file API.
So, let’s say we are writing a server. How do we use sockets for that? The basic lifecycle of a server socket looks like this:
First we ask the kernel for a socket with the socket(2)
system call. We
specify the family of the socket (IPv4, IPv6, local), the type (stream,
datagram) and the protocol (TCP, UDP, …). The kernel then returns a file
descriptor, a number, which represents our socket.
Then we need to call bind(2)
, to bind our socket a network address and a
port. After that we need to tell the kernel that our socket is a server socket,
that will accept new connections, by calling listen(2)
. So now the kernel
forwards incoming connections to us. (This is the main difference between
the lifecycles of a server and a client socket).
Now that our socket is a real server socket and waiting for new incoming
connections we can call accept(2)
, which accepts connections and returns a new
socket. This new socket represents the connection. We can read from
it and write to it.
But here’s the thing: accept(2)
is a blocking call. It only returns if the
kernel has a new connection for us. A server that doesn’t have too many
incoming connections will be blocking for a long time on accept(2)
. This
makes it really difficult to work with multiple sockets. How are you going to
accept a connection on one socket if you’re still blocking on another socket
that nobody wants to connect to?
This is where select(2)
comes into play.
select(2)
is a pretty old and famous (maybe infamous) Unix system call for working
with file descriptors. It allows us to do multiplexing: we can monitor
several file descriptors with select(2)
and let the kernel notify us as soon
as one of them has changed its state. And since sockets are file descriptors too,
we can use select(2)
to work with multiple sockets. Like this:
sock1 = Socket.new(:INET, :STREAM)
addr1 = Socket.pack_sockaddr_in(8888, '0.0.0.0')
sock1.bind(addr1)
sock1.listen(10)
sock2 = Socket.new(:INET, :STREAM)
addr2 = Socket.pack_sockaddr_in(9999, '0.0.0.0')
sock2.bind(addr2)
sock2.listen(10)
5.times do
fork do
loop do
readable, _, _ = IO.select([sock1, sock2])
connection, _ = readable.first.accept
puts "[#{Process.pid}] #{connection.read}"
connection.close
end
end
end
Process.wait
That’s a 23-line TCP server, listening on two ports, with 5 worker processes accepting connections. Besides missing some minor things like HTTP request parsing, HTTP response writing and error handling it’s pretty much ready to ship.
No, but seriously, this actually does a lot of stuff in just a few lines with the help of system calls.
We create two sockets with Socket.new
, which somewhere deep down in Ruby
calls socket(2)
. Then we bind the sockets to two different ports, 8888 and
9999 respectively, on the local interface. Afterwards we call listen(2)
(hidden by the #listen
method) and tell the kernel to queue up 10 connections
at maximum for us to handle.
With our sockets ready to go we call fork
5 times, which in turn creates 5
child processes that all run the code in the block. So every child calls
IO.select
(which is the wrapper around select(2)
) with the two sockets as
argument. IO.select
is going to block and only return if one of the two sockets
is readable (on a listening socket that means that there are new connections).
And this is exactly why we use select(2)
here: with accept(2)
we would block
on one socket and miss out if the other socket had a new connection.
IO.select
returns the readable sockets in an array. We take the first one and
call accept(2)
on it, which is now going to return immediately. Then we just
read from the connection, close the connection socket and start our worker loop
again.
If we run this and send some messages to our server with netcat like this:
$ echo 'foobar1' | nc localhost 9999
$ echo 'foobar2' | nc localhost 9999
$ echo 'foobar3' | nc localhost 8888
$ echo 'foobar4' | nc localhost 8888
$ echo 'foobar5' | nc localhost 9999
Then we can see our server accepting the connections and reading from them:
$ ruby tcp_sockets_example.rb
[31605] foobar1
[31607] foobar2
[31605] foobar3
[31607] foobar4
[31609] foobar5
Each connection handled by a different child process. Load balancing done by the
kernel for us, thanks to select(2)
.
Unicorn, sockets and select
Before master process calls fork
to create the worker processes, it calls socket
,
bind
and listen
to create one or more listening sockets (yes, you can configure
Unicorn to listen on multiple ports!). It also creates the pipes that will be
used to communicate with the worker processes.
After forking, the workers, of course, have inherited both the pipe and the listening sockets. Because, after all, sockets and pipes are file descriptors.
The workers then call select(2)
as part of their worker_loop
with both
the pipe and the sockets as arguments. Now, whenever a connection comes in,
one of the workers’ call to select(2)
returns and this worker handles the
connection by reading the request and passing it to the Rack/Rails application.
And here’s the thing: since the workers call select(2)
not only with the sockets,
but also with the master-to-worker pipe, they’ll never miss a message from the
master while waiting for a new connection. And if there is a new connection,
they handle it, close it and then read the message from the master process.
That’s a really neat way to do load balancing through the kernel and to guarantee that messages to workers are not lost or delayed too long while the worker process is doing its work.
Signals
Let’s talk about signals. Signals are another way to do IPC under Unix. We can send signals to processes and we can receive them.
$ kill -9 8433
This sends the signal 9, which is the KILL
signal, to process 8433. That’s
pretty well-known and a lot of people have used this before (probably with
sweat running down their face). But did you know that pressing Ctrl-C
and
Ctrl-Z
in your shell sends signals too?
So what are signals? Most often they are described as software interrupts. If we send a signal to the process, the kernel delivers it for us and makes the process jump to the code that deals with receiving this signal, effectively interrupting the current code flow of the process. Signals are asynchronous — we don’t have to block somewhere to send or receive a signal. And there are a lot of them: the current Linux kernel for example supports around 30 different signals.
Sending signals is pretty good, and I’d bet we’ve all done it a bunch of times, but what’s really cool is this: we can tell the kernel how we want our process to react to certain signals. That’s called “signal handling”.
We have a few options when it comes to signal handling. We can ignore
signals: we can tell the kernel we don’t care about a signal and when the
kernel delivers an ignored signal to our process it doesn’t jump to any
specific code, but instead does nothing. Ignoring signals has one limitation
though: we can’t ignore SIGKILL
and SIGSTOP
, since there has to be a way
for an administrator to kill and stop a process, no matter what the developer
of that process wants it to do.
The second option is to catch a signal, effectively defining a signal
handler. If ignoring a signals means “Nope, kernel, don’t care about QUIT.”
then defining a signal action is telling the kernel “Hey, if I receive this
signal, please execute this piece of code here”. For example: a lot of Unix
programs do some clean-up work (remove temp files, write to a log, kill child
processes) when receiving SIGQUIT
. That’s done by catching the signal and
defining an appropriate signal handler, that does the clean-up work. Catching
signals has the limitations that ignoring signals has: we can’t catch SIGKILL
and SIGSTOP
.
We can also let the defaults apply. Each signal has a default action associated
with it. E.g. the default action for SIGQUIT
is to terminate the process and
make a core dump. We can let that one leave it as it is, or
redefine the signal action by catching it. See man 3 signal
on OS X or man 7 signal on Linux for a list of the
default actions associated with each signal.
So, how do we catch a signal? In Ruby it’s pretty simple:
# signals.rb
trap(:SIGUSR1) do
puts "SIGUSR1 received"
end
trap(:SIGQUIT) do
puts "SIGQUIT received"
end
trap(:SIGKILL) do
puts "You won't see this"
end
puts "My PID is #{Process.pid}. Send me some signals!"
sleep 100
We use trap
to catch a signal and pass it a block to define a signal action
that will be executed as soon as our process receives the signal. In this
example, we try to redefine the signal handler for SIGUSR1
, SIGQUIT
and
SIGKILL
. The sleep
statement gives us time to send the signals to our
process.
If we run this and then send signals to our process with the kill
command like
this:
$ kill -USR1 31950
$ kill -QUIT 31950
$ kill -KILL 31950
Then our process will output the following:
$ ruby signals.rb
My PID is 31950. Send me some signals!
SIGUSR1 received
SIGQUIT received
zsh: killed ruby signals.rb
As we can see, the kernel delivered all of the signals to our process. On
receiving SIGUSR1
and SIGQUIT
it executed the signal handlers, but, as I
said before, catching SIGKILL
proved useless and the kernel killed the process.
You can probably imagine what we can do with signal handlers. One of the most
common things to do with custom signal handlers, for example, is to catch
SIGQUIT
to do some clean-up work before exiting. But there are a lot more
signals and defining appropriate signal handlers can distinguish well-behaving
processes from rude ones. Example: if a child process dies the kernel notifies
the parent process by sending a SIGCHLD
. The default action is to ignore the
signal and do nothing, but a well-behaving application would probably wait
for the child, clean up after him and write something to a log file.
Unicorn and signals
Unicorn sets up a lot of different signal handlers
in the master process, before it calls fork
and spawns the worker processes.
These signal handlers do a lot of things. Here are a few examples:
- QUIT — Graceful shutdown. The master process waits for the workers to finish their work (the current request), cleans up and only then exits.
- TERM and INT — Immediate shutdown. Workers don’t finish their work.
- USR1 — Reopen the log files. This is mostly used and sent by a logration daemon.
- USR2 — Hot-Reload. Start up a new master process with a new version of the application and keep the old master running.
- TTIN/TTOU — Increase/decrease the number of worker processes.
- HUP — Reload the configuration file while running.
- WINCH — Keep the master process running, but gracefully stop the workers.
These signal handlers are like a separate API through which you tell the master and worker processes what to do. And it’s pretty reliable too, considering the fact that signals are essentially asynchronous events and can be sent multiple times. This just screams for race-conditions and locks. So how does Unicorn do it?
Unicorn uses a self-pipe to manage its signal actions. The pipe the master process sets up is this self-pipe, which it will only use internally and not to talk to other processes. It also sets up a queue data structure. After that come the signal handlers. Unicorn catches a lot of signals, as we saw, but each signal handler doesn’t do much. It only pushes the signal’s name into the queue and sends one byte through the self-pipe.
After setting up the signal handlers, spawning worker processes, and so on, the
master process goes into its main loop, in which it checks upon the workers
regularly and sleeps in between. But it doesn’t just sleep
, no, the master
process actually goes to sleep by calling select(2)
on the self-pipe, with a
timeout as argument. This way it can go to sleep but will be woken up as soon
as a signal arrived, since the signal handler just send a byte through the
pipe, turns it into a readable pipe (from the master’s perspective) and
select(2)
now returns. After waking up, the master just has to pop off a
signal from the queue it set up in the
beginning and handle the signals one after another. This is of tremendous value
if you consider again that signals are asynchronous and you’ll never know what
you’re currently executing when a signal arrives, and that they can be sent
multiple times — even if you’re currently executing your signal handler code.
Using a queue and a self-pipe in this combination makes handling signals a lot
saner and easier.
Worker processes, on the other hand, inherit the master’s signal handlers –
again: child processes inherit a lot from their parents. But instead of leaving
them as they are, the workers redefine (most of) the signal handlers to be
no-ops. They get their signals through the pipe which connects them to the
master process. If the master process, for example, receives SIGQUIT
it
writes the name of the signal to each pipe connected to a worker process to
gracefully shut them down. The worker processes call select(2)
on this
master-worker pipe and the listening sockets, which means that as soon as they
finish their work (or don’t have anything to do) they will read the signal name
from the pipe and act upon it. This “signal delivery from master to worker via
pipe”-mechanism avoids the many problems that can occur if a worker process
should receive a signal while currently working of a request.
Magic?
By now we have looked at fork(2)
and how easy it is to spawn a new process.
We saw that we can use pipes pretty easily outside a shell and without any use
of the pipe character by calling pipe(2)
and just working with the two file
descriptors as if they were files. We also created sockets, worked with
select(2)
, looked at a pre-forking TCP server in 23 lines of Ruby and had
the kernel of our operating system do our load balancing for us. Then we saw
that Unicorn has its own API composed of signals and that it’s not that hard to
work with signals.
These were just some basic Unix concepts. Trivial on their own, powerful when combined.
So, let’s have a closer look at these features of Unicorn that amazed me so much, that I was sure were created by some wizards with long robes and tall hats, in a basement far, far away, on old rusty PDP-11s.
Let’s see how this “magic” is just Unix.
Preloading
If we put preload = true
in the configuration file, Unicorn will “preload”
our Rack/Rails application in the master process to spare the worker process
from doing it themselves. As soon as the application is preloaded, spawning off
a new worker process is really, really fast, since the workers don’t have to
load it anymore.
The question is: how does this work exactly? Let me explain.
Right after Unicorn has evaluated command line options, it
builds a lambda called app
.
This lambda contains the instructions needed to load our Rack/Rails application
into memory. It loads the config.ru
file (or uses default settings) and then
creates a Rack application with Rack::Builder
, on which it calls #to_app
.
So what should come out of the lambda is a Rack application in which we just
need to call #call
to pass it a request and get a response. But since lambdas
are evaluated only as soon as they are called, this doesn’t happen when the
lambda is defined.
Unicorn passes this app
lambda on to the Unicorn::HttpServer
, which
eventually calls fork(2)
to spawn the worker processes. But before it creates
a new process, the HttpServer
checks if we told Unicorn to use preloading. If
we did, only then it calls the lambda. If we
didn’t, the workers would each call the lambda after the call to fork(2)
.
Calling the lambda, which hasn’t been called before, now loads our application into memory. Files are being read, objects are created, connections established – everything is somehow getting stored in memory.
And here comes the real trick: since the master loaded the application into
memory, which can take some time if we’re working with a large Rails
application, the worker processes inherit it. Yep, the worker processes inherit
our application. How neat is that? Since workers are created with fork(2)
they already have the whole application in memory as soon as they are created.
Preloading is just deciding if the Unicorn calls a lambda before or after the
call to fork(2)
. And if Unicorn called it before, creating new worker
processes is really fast, since they are basically ready to go right after
creation, except for some callbacks and setup work.
With copy-on-write, which works in the Ruby VM since 2.x, this is even faster. The reason is that “inheriting” involves copying from the parent’s to the child’s memory address space. It’s probably not as slow as you imagine, but with copy-on-write only the memory regions which the child process wants to modify are copied.
And the best part of it is this: the kernel is doing all the work for us. The
kernel answers the call to fork(2)
and the kernel copies the memory. We just need
to decide when to create our objects: before or after the call to fork(2)
.
This comes in really handy when we now look at another great feature of Unicorn.
Scaling workers with signals
Unicorn allows us to increase and decrease the number of its worker processes by sending two signals to the master process:
$ kill -TTIN 93821
$ kill -TTOU 93821
These two lines add and then remove a new worker process. The signals used,
SIGTTIN
and SIGTTOU
, are normally sent by our terminal driver to notify a
process running in the background when it’s trying to read from (SIGTTIN
) or
write to (SIGTTOU
) the controlling terminal. Since Unicorn doesn’t allow not
using a logfile when running as a daemon, this shouldn’t be an issue, which
means that Unicorn is free to redefine the signal actions (the default for both
signals is to stop the process).
It does so by defining signal handlers for SIGTTIN
and SIGTTOU
that, as we
saw, only add the name of the signal to the signal queue and write a byte to
the self-pipe to wake up the master process.
The master process, as soon as it wakes up from its main-loop sleep, sees the
signals and increases or decreases the internal variable worker_processes
,
which is just an integer. And right before it goes back to sleep, it calls
#maintain_worker_count
, which either spawns a new worker or writes SIGQUIT
to
the pipe connected to the now superfluous worker process to gracefully shut it down.
So let’s say we send SIGTTIN
to Unicorn to increase the number of workers.
What will happen is that the master wakes up (triggered by the write to the
self-pipe), increases worker_processes
and calls #maintain_worker_count
,
which in turn will call another method called #spawn_missing_workers
. Yes,
that’s right. We looked at this method before, its the same one that’s used to
spawn the worker processes when booting up. In its entirety it looks like this:
def spawn_missing_workers
worker_nr = -1
until (worker_nr += 1) == @worker_processes
WORKERS.value?(worker_nr) and next
worker = Worker.new(worker_nr)
before_fork.call(self, worker)
if pid = fork
WORKERS[pid] = worker
worker.atfork_parent
else
after_fork_internal
worker_loop(worker)
exit
end
end
rescue => e
@logger.error(e) rescue nil
exit!
end
Again, this is just a loop that calls fork(2)
N times. Now that N is
increased by one, a new worker process will be created. The other calls to
fork
are skipped by checking whether WORKERS
already contains an instance
of Worker
with the same worker_nr
.
Take note of worker_nr
here, it is important. All worker processes have a
worker_nr
by which they are easily identified in the row of spawned
processes.
If we now send SIGTTOU
to the master process, the following is going to
happen. First of all, the master is woken up by a fresh byte on the self-pipe.
Instead of increasing worker_processes
now, it decreases it. And again, it
calls #maintain_worker_count
, which doesn’t jump straight to
#spawn_missing_workers
. Since no worker process is missing,
#maintain_worker_count
now takes care of reducing the number of workers:
def maintain_worker_count
(off = WORKERS.size - worker_processes) == 0 and return
off < 0 and return spawn_missing_workers
WORKERS.each_value { |w| w.nr >= worker_processes and w.soft_kill(:QUIT) }
end
It may not be idiomatic Ruby, but these 3 lines are still fairly easy to
understand. The first line generates the difference between the number of
currently running worker processes and returns if it’s zero. If the difference
is negative, a new worker will be spawned (which is where the path of SIGTTIN
ends in this method). But since the difference is positive after decreasing
worker_processes
, the master process now takes the workers with a worker_nr
that’s too high and calls soft_kill(:QUIT)
on the worker instance.
This in turn sends the signal name through the pipe to the corresponding worker
process, which will catch that signal through select(2)
and gracefully shut
down.
After this, the master process calls Process.waitpid
(which in turn calls
waitpid(2)
), which returns the PID of dead children (and doesn’t leave them
hanging as zombies). The worker process with this PID now just needs to be
removed from the WORKERS
hash and Unicorn is ready to go again.
All of this is pretty simple: fork(2)
in a loop, pipes, signal handlers and
keeping track of numbers. Again: it’s the combination of that makes these Unix
idioms so powerful.
The same can be said for my favorite Unicorn feature.
Hot Reload
This fantastic feature has many names: hot reload, zero downtime deployment, hot swapping and hot deployment. It allows us to deploy a new version of our application, while the old one is still running.
With Unicorn “hot reload” means, that we can spin up a new master process, with new worker processes serving a new version of our application, while the old master process is still running and still handling requests with the old version.
It’s all triggered by sending a simple SIGUSR2
to the master process. But how?
Let’s take a step back and say that our Unicorn master and worker processes are
just humming along. The master process is sleeping, waking up, checking up on
the workers and going back to sleep. The worker processes are handling requests
without a care in the world. Suddenly a SIGUSR2
is sent to the master
process.
Again, the signal handler catches the signal, pushes the signal onto the signal
queue, writes a byte to the self-pipe and returns. The master wakes up from its
main-loop-slumber and sees that it received SIGUSR2
. Straight away it calls
the #reexec
method. It’s a fairly long method
and you don’t have to read through it now. But most of “hot reload” is
contained in it, so let’s walk through it.
The first thing the method does it to check if the master process is already
reexecuting (reexecuting means that a new master process is started by an old
one). If it is, it returns and its job is done. But if not, it writes the
current PID to /path/to/pidfile.pid.oldbin
. .oldbin
stands for “old
binary”. With the PID saved to a file, the master process now calls fork(2)
,
saves the returned PID of the newly created child process (to later check if
it’s already reexecuting…) and returns. The old master process adds “(old)”
to its process name (by changing $0
in Ruby) and is now done with #reexec
.
But since a process created with fork(2)
is executing exactly the same code,
the new child process goes ahead with #reexec
.
Right after the call to fork(2)
the child writes the numbers of the sockets
it’s listening on (remember: sockets are files, files are represented as file
descriptors, which are just numbers) to an environment variable called
UNICORN_FD
as one string, in which the numbers are separated by commas. (Yes,
it keeps track of listening sockets by writing to an environment variable. Take
a deep breath. It’ll make sense in a second.)
Afterwards it modifies the listening sockets so they stay open by setting the
FD_CLOEXEC
flag on them to false.
It then closes all the other file descriptors it doesn’t need (e.g.: sockets and files opened by the Rack/Rails application).
With all preparations and cleaning done, the child process now calls execve(2)
.
The execve(2)
system call turns the calling process into a completely
different program. Which program it’s turned into is determined by the
arguments passed to execve(2)
: the path of the program, the arguments and
environment variables. This is not a new process we’re talking about: the new
program has the same process ID, but its complete heap, stack, text and data
segments are replaced by the kernel.
This is how we can spawn new programs on a Unix system and what every Unix
shell does when we try to launch Vim: it calls fork(2)
to create
a child process and then it calls execve(2)
with the path to the Vim
executable. Without the call to execve(2)
we’d end up with a lot of copies of
the original shell process when trying to start programs.
That’s also why Unicorn needs to set the FD_CLOEXEC
flag to false on the
sockets before it calls execve(2)
. Otherwise the sockets would get closed,
when the of the process is being replaced.
Unicorn calls execve(2)
with the original command line arguments it was
started with (it keeps track of them), in effect spawning a fresh Unicorn
master process that’s going to serve a new version of our application. Except
that it’s not completely fresh: the environment variables the old master
process set (UNICORN_FD
) are still accessible by the new master process.
So the new master process boots up and loads the new application code into
memory (preloading!). But before it creates worker processes with fork(2)
, it
checks the UNICORN_FD
environment variable. And it finds the numbers of our
listening sockets! And since file descriptors are just numbers, it can work
with them. It turns them into Ruby IO
objects by calling IO.new
with each
number as an argument and has thereby recovered its listening sockets.
And now it calls fork(2)
and creates worker processes which inherit these
listening sockets again and can start their select(2)
and accept(2)
dance
again, now handling requests with the new version of our application.
There is no “address already in use” error bubbling up. The new master process inherited these sockets, they are already bound to an address and transformed into listening sockets by the old master process. The new master process and its workers can work with them in the same way the worker processes of the old master process do.
Now there are two sets of master and worker processes running. Both are handling incoming connections on the same sockets.
We can now send SIGQUIT
to the old master process to shut it down and as soon
as it exits the new master process takes over and only our new application
version is being served. And all of this happened without the old worker
processes stopping their work once.
It’s just Unix
All of this is just Unix. The master-worker architecture, the signal handling, the communication through pipes, the preloading, the scaling of workers with signals and the hot reloading of Unicorn. There is no magic involved.
I think that’s the most amazing part about all of this. The combination of
concepts like fork
, pipe
and signals, that are easy to understand on their
own, and leveraging the operating system is where the perceived magic and
ultimately the power of great Unix software like Unicorn comes from.
Why?
You might be thinking: “Why? Why should I care about this low-level stuff? I
build web applications, why should I care about fork
and select
?
I think there are some really compelling reasons.
The first one is debugging. Have you ever wondered why you shouldn’t open a
database connection (a socket!) before Unicorn calls fork(2)
? Or why you get
a “too many open files” error when you try to make a HTTP request (sockets!)?
Now you know why.
Knowing how your system works on each layer of the stack is immensely helpful when trying to find and eliminate bugs.
The next reason I call the design and architecture reason and boils down to having answers to questions like these: should we use threads or processes? How could these processes talk to each other? What are the limitations? What are the benefits? Will this perform? What’s the alternative?
With some understanding of your operating system and the APIs it offers, it’s far easier to make architectural decisions and design choices when building a system or single components of it.
One more level of abstraction. Someone somewhere at some time said that “it’s always good to know one more level of abstraction beneath the one you’re currently working on” and I totally agree.
I like to think, that learning C made me a better Ruby programmer. I suddenly knew what was happening behind the curtains of the Ruby VM. And if I didn’t know, I could make a good guess.
And I think that knowing deeply about the system to which I deploy my (web) application makes me a better developer, for the same reasons.
But the most important reason for me, which is a personal one, is the realization that everything Unicorn does is not magic! No, it’s just Unix and there is no secret ingredient. Which, in turn, means that I could write software like this. I could write a webserver like this! Realizing this is worth a lot.