Why threads can't fork

13 Oct 2014

There is an interesting thread on the Go issue tracker about daemonizing processes. Most of the thread is not about daemonizing processes though, but more about why Go has no Fork() function which you can call directly in your code. The first time I read through it I was wondering and saying to myself: “Yeah, why is there no Fork()? It surely can’t be that hard to implement.” After all you can already call system calls with the syscall package. As I read more and more I realized that the problem is not implementing Fork() per se, but rather implementing Fork() to work safely in a multi-threaded environment, which most Go programs are. So I tried to find out why.

And it turns out that the problem stems from the behaviour of fork(2) itself. Whenever a new child process is created with fork(2) the new process gets a new memory address space but everything in memory is copied from the old process (with copy-on-write that’s not 100% true, but the semantics are the same).

If we call fork(2) in a multi-threaded environment the thread doing the call is now the main-thread in the new process and all the other threads, which ran in the parent process, are dead. And everything they did was left exactly as it was just before the call to fork(2).

Now imagine that these other threads were happily doing their work before the call to fork(2) and a couple of milliseconds later they are dead. What if something these now-dead threads did was not meant to be left exactly as it was?

Let me give you an example. Let’s say our main thread (the one which is going to call fork(2)) was sleeping while we had lots of other threads happily doing some work. Allocating memory, writing to it, copying from it, writing to files, writing to a database and so on. They were probably allocating memory with something like malloc(3). Well, it turns out that malloc(3) uses a mutex internally to guarantee thread-safety. And exactly this is the problem.

What if one of these threads was using malloc(3) and has acquired the lock of the mutex in the exact same moment that the main-thread called fork(2)? In the new child process the lock is still held - by a now-dead thread, who will never return it.

The new child process will have no idea if it’s safe to use malloc(3) or not. In the worst case it will call malloc(3) and block until it acquires the lock, which will never happen, since the thread who’s supposed to return it is dead. And this is just malloc(3). Think about all the other possible mutexes and locks in database drivers, file handling libraries, networking libraries and so on.

In order to call fork(2) in a safe way the calling thread would need to be absolutely sure that all the other threads are to fork too. And this is hard, especially if you’re going to implement a wrapper around fork(2) in a library and have no idea what’s going to be happening all around you.

If the new child process is going to be turned into a different process with execve(2) the problem is not that big, since the heap, stack and data will be replaced. That’s why there is a os.StartProcess() in Go, which uses fork(2) under the hood (see line 65 here). There is still the problem of open file descriptors, which the new child process will inherit but were intended to be used only a now-dead thread. But it’s still possible to close them up, since the new child process would have direct access.

Now you might realize that the title of this post is a lie, since threads can fork. But in practice it’s really hard to pull off, which explains why the Go issue mentioned at the beginning is nearly 5 years old.

There are of course a couple of attempts to provide a solution. [pthread_atfork(3)][http://linux.die.net/man/3/pthread_atfork] allows users to register handlers in threads to be called right before and after fork. But as you can imagine, this can be cumbersome too. Solaris has forkall(2), which does not kill the non-forking-threads but keeps them alive and doing exactly what they did before. This behaviour comes with its own share of problems:

if a thread calls forkall(), the parent thread performing I/O to a file is replicated in the child process. Both copies of the thread will continue performing I/O to the same file, one in the parent and one in the child, leading to malfunctions or file corruption.

To conclude: yes, the title is a lie, and yes, you can fork(2) in a multi-threaded environment, but it is really, really difficult to pull off safely. So let’s just say that threads can’t fork and leave it at that.