Thorsten Ball

Professional Programming: The First 10 Years

2022-05-17T06:32:00+00:00

Last month, April 2022, marked the 10 year anniversary of my start as a professional programmer.

I started programming earlier than that, but hadn’t been paid a salary. As a teenager I built websites and IRC bots and wrote tiny Python scripts. Then I stopped and played guitar for a few years. In my twenties, I rather coincidentally rediscovered how much I enjoy programming when I was asked to build another website and found out how much had changed about the web while I was away (it’s HTML5 now!).

That made me wonder whether programming wouldn’t be the better career choice than continuing to study philosophy at university. Robin answered that question for me by generously offering me a paid internship.

Now it’s been 10 years, which is, to be honest, neither a significant marker of my growth as a programmer nor my career, but realising that it’s been 10 years made me pause and reflect.

The following is a loose, unordered collection of thoughts that come up when I look back on the past 10 years. Things I’ve learned, things I’ve unlearned, things I’ve changed my opinion on, things I never thought I’d believe in and now do.

They’re very much products of the context in which I helped develop software: as an intern for Robin, then as a junior developer for Robin, as a software developer for a small German startup, as a senior software developer for a German startup inside a huge German corporation, and now as a staff engineer for a fully remote, asynchronous US startup. Take that as a disclaimer. I bet if I’d worked in a game studio, a hardware company, and a big tech corporation instead, this text would be very different.

Fearlessness is undervalued

Most of the programmers I look up to and learned from share one trait that is rarely talked about: fearlessness.

They dive into an unknown codebase without fear. They open up the code of a dependency that they suspect is misbehaving without fear. They start working on something without knowing how they’ll finish.

It’s inspiring seeing someone being fearless, but becoming fearless yourself is one of the best learning accelerators I’ve found.

You can’t predict the future; try and you might end up in trouble

We all know this. Of course, we can’t predict the future.

But it took me years to truly take it into account when programming.

In the first third of my career I’d think: we will need this, so let’s build it now.

In the second third: we might need this, so let’s prepare for it.

Now: we don’t know whether we’ll need this, it’s a possibility, sure, and it looks like we might need it, yes, but things change all the time, so let’s build what we know we need right now.

Of course I write code so it’s easy to test

I also write code so it’s easy to read and understand, or easy to delete, or easy to modify, or easy to review. I don’t write code only for the computer to execute.

Nothing really matters, except bringing value to the customer

Type safety, 100% test coverage, the ability to fluently express business logic in code, perfect development tooling, an efficient system that wastes no resources, using the best programming language for the job, an elegant API design, a fast feedback loop, writing great code – these are not the goal.

Here’s the goal: providing value to your customers, by shipping software that solves their problem, repeatedly.

The things above help you do that – faster, cheaper, more efficiently, safer, with greater joy – but they’re not the goal. The goal is to provide value to your customers.

The trap: it’s often easier to write software than to deliver it. But delivering is what it’s all about.

Perfection is unachievable

I’m not sure I ever thought it is, but now I’m certain it is not. Everything is the result of trade-offs.

You will never reach 100% on every axis that you care about. Something has to give. And when you think you did make it perfect, you’ll soon realise that you forgot something.

My aesthetics have changed too. Instead of looking for the beauty that lies in perfection I now think the program that succeeds despite its flaws is beautiful. Look at that little program go, holding the internet together, despite the 17 TODOs in it.

If you can’t connect it to the business, it doesn’t matter

You can refactor a codebase and clean it up significantly, making it easier to understand for everybody and easier to extend, but all of that won’t matter if that codebase gets deleted four months later because the project didn’t help the business.

You can spend weeks adding tracing and observability to all of the code you write, only to realise that nobody will ever look at it, because that code runs three times a day and never causes any problems.

You can tweak and optimize your code to run so efficiently that the company can halve the number of machines required to run it and then see that the costs you saved are nothing in comparison to the salary you were paid while optimizing.

You can spend your time doing fantastic technical work and still waste it.

Figure out what the rule is trying to prevent, then consider the rule optional

If you’d asked me 5 years ago whether TDD, Clean Code, Software Craftsmanship, and other schools of thought are dogmatic, I would’ve said “no! Can’t you see? Clean and good code is important!”

Now I look back at the time when I thought that a rule such as “a method should not be longer than 5 lines” was useful and shake my head.

It’s not about the rules! It’s about the problems these rules are trying to prevent. If you don’t have the problem they’re trying to prevent, or you can prevent it another way, you don’t need the rule.

Write tests that give you confidence that the system works as it should

Don’t worry too much about whether a test is an integration or an end-to-end test, a unit test or a functional test. Don’t fight with others about whether you should test private methods or not. Stop worrying about whether you should hit the database in tests or not.

Instead write tests that tell you the system is working the way it should. Ideally with 3 keystrokes and in less than 1 second.

This one took me a long time, a lot of ultimately useless discussions, and bugs in my code to learn.

Best practices are often based on the assumption that you know what the code should do

If you know exactly what you want to build then best practices and patterns can help you, by giving advice on how to build it.

But if you don’t know yet what the program should do, or what it will look like in four weeks, then some best practices can make things even harder.

Some practices are the best when applied to a rewrite, but the worst when you’re still exploring.

Using other people’s code is not as good as I thought

I started my career writing Ruby and JavaScript, with package managers being available and the question “isn’t there a package that does that?” always hanging in the air.

Common sense dictated: if you can, try to use a library instead of writing it yourself. Reuse code as much as you can. Don’t reinvent the wheel. Don’t copy & paste. That was what I believed for years.

But there are downsides to that. Sometimes writing that one function yourself might actually be better than adding a dependency.

Dependencies aren’t free. You have to keep them up to date. They increase your compile or loading times. They add strange things to your stack traces. And very often they do more than what you need them to do, which means you’re paying for more than you’re getting.

When you’re glueing other people’s code together, there’s a very real danger that the glue is where complexity will accumulate. But glue code is the last place where you want your complexity to live. It hides complexity. What you want is to make complexity as visible as you can, shining a light on it with the hope that it turns into dust and disappears.

Sometimes it’s better to write it yourself than to use other people’s code.

Some companies get it, others don’t. But nobody’s perfect

There is a big difference between developing software for a software company and developing software for a company that employs software developers because it has to. It’s a joy to work for a company in which leadership gets software and how it’s made.

That being said: I don’t think any company has it all figured out. Everybody’s winging it to some degree.

Investing in feedback loops is never wasted effort

I’ve never regretted improving a feedback loop. Faster tests, better test output, faster deploys, turning a manual feedback loop into something that gives me a signal with one keybinding.

Watch out, though: once you’ve seen the light of developing software with a really fast and high-signal feedback loop, you’ll long for it forever.

Always leave something unfinished at the end of the day

A failing test, a compiler error, a half-finished sentence – end your day with one of these and the next morning you can sit down and continue where you left off, skipping “hmm, what should I do today…” entirely.

There’s nothing that gets me started as fast as a failing test that needs to pass.

Perfectionism is a trap

Perfectionism is based on a lie. You’ll never get to the point where you’re done and sit and rest and say “ah, now it’s perfect”. There’ll always be something. You know it, I know it. There’s no perfect (see above). Accept it and ship and continue building.

Aim for 80% and consider the other 20% optional. It’s freeing and gives you room to breath. You might end up at 99%, who knows?

Sharpen the axe

I’ve gotten a lot out of investing in my tools: Vim, git, shells, the Unix environment, testing frameworks. I truly enjoy spending a Sunday morning with my Vim configuration.

But it’s possible to overdo it and get stuck in the configuration phase, doing endless tinkering. You have to use your tools to get feedback on how to best configure and use them.

Hiring is hard

I’ve done hundreds of interviews now and the most important insight I’ve gained is that hiring is really, really hard. The verdict on an interview has so many random inputs that it makes everything between a Strong Yes and Strong No wobbly.

Often I wish there was a way to find out whether people have the get-shit-done gene.

The most important trait in developers: rolling up their sleeves because it has to get done

Here’s something that all the people I enjoyed working with have in common: they do the work. They know that some tasks aren’t fun or glamorous or interesting. But someone has to do them, so they do them.

Work on a codebase with other people over a longer period of time

Nothing has helped me get better at software engineering as much as working with a group of other people on the same codebase over multiple years.

You’ll see how decisions play out.

You’ll see what ended up mattering and what didn’t.

You’ll see how extensible your code truly is when your colleague tries to modify it 3 years after you wrote it.

You’ll see whether your prediction of “we have 2 of these now, but I’m sure there’ll be 5 in the future” will come true or not and can take the outcome into account when doing other predictions.

You’ll regret writing some code and you’ll be happy that you wrote some other code. You’ll learn from reflecting on the difference between the two.

You’ll see tooling break down just because something somewhere changed and you had nothing to do with it but you still have to fix it.

You’ll say “I’ve never had to think about this in 3 years” about some pieces of software and cherish them.

You’ll see what parts of the codebase new colleagues struggle to understand and which parts they immediately get productive in.

You’ll see what the code you wrote looks like 4 years later.

Knowing the full stack

There’s few things as motivating to me as hearing “you don’t really need to know how it works…”

Sure, I might not need to, but I wouldn’t do the work I do today if I hadn’t tried to find out how a GC works, or how Unix works, or how multi-threading works, or how a database stores data, or how interpreters and compilers work.

It benefits the work I do, too. I can make better technical decisions by being able to weigh trade-offs more accurately, knowing what goes on under the hood.

Typing can be the bottleneck

I’ve said it before. Don’t let typing be the bottleneck.

Code reviews aren’t waterproof

For the longest time I assumed it’s my fault when a bug made it through one of my code reviews. I missed that! How could I have missed that? It’s so obvious!

Later I found out that it’s not just me: other people miss bugs in code reviews too. In fact, they accept and freely talk about how code reviews aren’t infallible. I was relieved.

It changed how I see code reviews: as something imperfect, something that needs to be combined with other ways to verify the code.

Not every code review is worth the effort

Not every code needs a really thorough review. Sometimes, if the risk is acceptable, it’s fine to drop a quick “LGTM!”. It unblocks your colleagues, keeps momentum and, somehow, builds trust.

Negativity begets negativity

The more you give in to negativity, the more you get. Always much more than you wanted.

It’s viral. It starts with snark, it turns into cynicism, it then morphs into “everything sucks”. Soon after, the question of “why even bother?” starts to attach itself to everything. It ends with people hiding excitement and joy and ideas from you.

Being negative is too easy. At a certain point I realised that pointing at things and saying what’s bad about them and shrugging because, well, didn’t I expect this to be bad (everything’s bad, right?) - that’s easy. Easy to do and easy to mistake for an engineering mindset that can spot deficiencies and worst cases (which it is not).

What’s hard is seeing things for what they could be, what’s beautiful about them. Encouraging ideas even when they’re barely something to talk about. Creating and fostering joy. That’s challenging.

So at some point I decided I had enough and tried to do the challenging thing. So far it’s served me well.

Every dial at 100% all the time doesn’t work

I can’t do everything equally well all the time. I can’t write a book and make progress in my career and be a great father and set PRs in the gym and read two books. It won’t work for more than one or two weeks. It’s not sustainable.

Now I let my interests take turns: when I want to make progress on a specific thing, I focus on that for a while and accept that the other things have to go into maintenance mode.

Code has mass

Code has mass. Every additional line of code you don’t need is ballast. It weighs your codebase down, making it harder to steer and change direction if you need to. The less code you need, the better.

Code has to be read, it has to be tested, it has to be kept compatible, it has to stay secure, it has to keep working. Even if it’s not doing any useful work. It doesn’t hurt having it around, does it? Yes, it does. Delete it and move on. If necessary, restore from version control.

The same is true for tests, which I’ve only learned too late.

Programming as a part of my life

Ever since I started as an intern I spent a considerable amount of time outside of work on programming: reading technical books, writing books, working on sideprojects, writing blog posts, giving talks, traveling to conferences, learning new languages and tools.

That some companies don’t care about your college degree if you can demonstrate that you’re really good at programming was fuel for me for years.

I enjoy spending time on programming outside of work, but not all the time. Some of it feels like work. It takes effort to read some technical books. But some things don’t have to feel good while you’re doing them.

My career would be completely different if I had only programmed and learned about programming at my day job.

Computers are fast

Building web applications made me think that 100ms is fast and that 50ms is really fast. Writing a compiler has taught me that 1ms is an eternity for a modern computer.

I still love programming very much

Some of what I wrote can be interpreted as me having grown cynical over the years. I mean: nothing matters and perfection is unachievable? Come on.

But it’s the opposite. I still care. I care very much. But I care about fewer things and I still love programming very much.

The context in which we build software

2020-09-15T18:32:00+00:00

I grew up in a what I now know people to consider a really small town. There wasn’t a lot, but even in that small town we had 2-3 lawyers. And to make a point about technology and how we develop software I want to paint you a picture of these German small-town lawyers with a very broad brush.

They often have their office in their own house, a little sign in front of it says so. Sometimes it’s a separate room, sometimes the downstairs apartment. They help the local Mittelstand companies with their contracts, or they’re specialised in traffic law and can help you when you had a car crash. They know possibly everyone in the notary’s office on a first-name basis, use their telephone as the primary means of communication and their suits aren’t tailor-made. There’s a fax number on the business card, since this is Germany.

If you leave this town, drive for 45 minutes and then take a plane for another 45 minutes and 7 hours you end up in New York City, where they also have lawyers. Let me dip my broad brush into some cheap paint I got from movies and TV shows and paint you one of these New York lawyers from one of New York’s big law firms: big office in a skyscraper, expensive suit, expensive apartment, large corporations as clients, lots of money being paid every hour, hundreds of colleagues that worry about all the things a lawyer shouldn’t need to worry about so they can concentrate on high-level decisions.

I’m fully aware that these paintings are not masterpieces. But try to imagine: my small-town lawyer takes the car and the plane and ended up in New York in an office with his New York equivalent — what would they talk about? Probably how different things are. You have how many cases a month? That’s your client? Okay, wow. No, my wife does the bookkeeping, no, it’s absolutely fine. Repeat that: how long have you been working on this single case?

Two more paintings: a portrait of our small-town doctor and one of the head of neurology in Singapore’s General Hospital. Another pair: small-town architect and one of those architects that designs the new conference center in a large city, or its opera house, or a whole district.

Each pair shows the same profession — lawyer, doctor, architect — but there’s not much in the daily lives of the portrayed they have in common. In fact, a day in the life of a small-town lawyer looks probably nothing like the typical day a big law firm lawyer in New York has.

And that is 100% fine, because their goals and problems are different.

If I were to paint you two pictures of software developers, one of a developer working for a big tech company in Silicon Valley and one working for the IT department of a 200 year old publishing house in Germany with 150 employees, they would also look completely different, because here too, the goals and problems are completely different.

But you know what these two software developers do? They go online and they write about their day-to-day and what they’re doing. And they don’t mention their needs and wants and what their goals and problems are and how they’re highly specific to their environment and how that does and should shape their work. They wouldn’t even mention that they only have three colleagues and that they only work on a single software project that’s critical to the survival of the company. Not a single word would they write about the fact that they work in a research team in a research department that has, effectively, no deadlines. And no paying customers.

Yet their equivalent on the other side of the world would read it and say: “we need to do what they’re doing! It’s working for them! Let’s use the technology they are using. The programming language, the database, the deployment system — if they are using it, why shouldn’t we?”

How can you not be romantic about programming?

2020-09-08T18:07:00+00:00

There’s a scene in Moneyball in which Brad Pitt’s character, the manager of the Oakland A’s, is watching a recording of one of his players trying so hard to run fast that he stumbles and falls. Lying on the ground he’s angry at himself, because he doesn’t realize that right before he started his run he hit a home run and scored the game-winning points. Watching the scene, Pitt leans back, smiles a Brad Pitt smile and says: “how can you not be romantic about baseball?”

There are moments in which I ask myself the same thing about programming.

We’re programming computers. We spend large parts of our days writing down instructions for machines. Other parts of the day are spent making sure that we chose the right instructions. Then we talk about those instructions: why and how we picked the ones we picked, which ones we will consider in the future, what those should do and why and how long it will probably take to write those down.

It can sound very serious and dry; a bureaucracy of computer instructions. And yet.

And yet we, the ostensible bureaucrats, talk about magic as something that exists — the good and the bad kind. There are wizards. Instructions are “like a sorcerer’s spells”.

We don’t call them instructions, though, not when talking about what we produce each day anyway. It’s code we write. Emotions are involved. Code, we say, can be: neat, nice, clean, crafted, baroque, minimal, solid, defensive, hacky, a hack, art, a piece of shit, the stupidest thing I’ve ever read, beautiful, like a poem.

Some lines of code are a riddle to anyone but their author and the name code serves as a warning. Other times, strangely, it’s a badge of honor.

Fantastic amounts of code have been written, from beginning to end, by a single person, typing away night after night after night, for years, until one day the code is fed to a machine and, abracadabra, a brightly coloured amusement park appears on screen. Other code has been written, re-written, torn apart and stitched back together across time zones, country borders and decades, not by a single person, but by hundreds or even thousands of different people.

This world of programming is held together by code. Millions and millions of lines of code. Nobody knows how much there is. Some of it is more than 30 years old, some less than a week, and chances are you used parts of both yesterday. There are lines of code floating around on our computers that haven’t been executed by a machine in years and probably won’t be for another lifetime. Others are the golden threads of this world, holding it together at the seams with no more than a dozen people knowing about it. Remove one of these and it all comes crashing down.

If you haven’t been here long enough and try to guess how much there is and how many generations are layered on top of each other — you won’t even come close. But stay around. After a while, more and more, you’ll find yourself in moments of awe, stunned by the size and fragility of it all; the mountains of work and talent and creativity and foresight and intelligence and luck that went into it. And you’ll reach for the word “magic” because you won’t know how else to describe it and then you lean back and smile, wondering how someone could not.

No, typing can be the bottleneck

2020-09-01T08:00:00+00:00

One of the eternal laws of the internet dictates that as soon as one person says they have a new thing that lets them type faster — a keyboard, a keyboard layout, an editor configuration, etc. — somebody else must say: “but typing is not the bottleneck!”

What the second person means is that the first person is wasting their time. They’re optimizing something that’s not slowing them down. Typing is not the bottleneck, because “typing is perhaps 0.5-1% of my programming time”. Programming is thinking, talking to people, planning, researching, is what incarnations of person #2 are saying. And if you think your keyboard is holding you back from thinking, well, you’re wrong.

I get that. I don’t write code the whole day either.

But here’s what I need to write and type besides code: commit messages, pull request descriptions, emails, tickets, comments on tickets, code reviews, documentation, Slack messages, notes, journal entries, RFCs, design documents, requirements.

And being able to type fast and without much effort sure as hell helps. It lets me get back to the thinking.

Effort, that’s the important one, not raw speed. It doesn’t matter if you can type 90 words per minutes or 130, but if it takes you effort to type something and, if given the choice, you’d rather not do it, then we have a problem.

Take me on my phone, for example. It takes me a lot of effort to type on it. It’s not only that I’m slow because I can’t use more than two fingers, but every third word is a typo. Or, even more infuriating, it’s the wrong word and I need to correct autocorrect. Or I switched from English to German while typing and my phone doesn’t know what an umlaut is anymore.

If you see me typing on my phone chances are you can also hear me producing an angry growl-like sound.

Which is exactly why I don’t do it a lot. I barely use the note taking apps I have, because I’d rather type on my computer, with a proper keyboard. I often refrain from replying to messages if I don’t have to, because, yep, I’d have to type those replies.

In other words: typing is a bottleneck for me.

And it’s not just me on my phone. Ever had a chat conversation with somebody who wasn’t comfortable typing? Here’s what you get: short messages, acronyms, typos, missing sentences. You can read how they struggled to type. I once had a colleague who had the habit of walking to my desk saying “Ah, before I type all that up, I thought I’d quickly tell you in person” where “all that” was three to four paragraphs, at the most.

And that’s what this is about. If you’re not comfortable typing a lot and you’d rather not write something down then typing is the bottleneck and you need to fix it. Typing is not something you should need to think about.

Because wouldn’t it be a waste of time if you spend 99% of your time lying in a hammock, thinking, but then choose not to write all of your ideas down, because typing is the bottleneck?

But does it help you ship?

2020-08-25T07:30:00+00:00

Whenever I’m not sure whether I’m spending my time on the right thing I ask myself: does it help me ship?

If what I consider working on is not the thing we want to ship itself, but lies in the vast grey area of software projects where I could write code all day long without the user ever noticing, this question helps me decide whether to drop it or invest some time in it.

Let me illustrate.

One imaginary Friday afternoon I notice that we have a few // TODO comments in our codebase. Hmm, I could create a bot that looks for those comments whenever a new commit is pushed. It could use git blame to see who the author is and create a ticket assigned to them, saying that they should fix their TODO in line X in file Y, please. And, cherry on top, when a pull request that touches a TODO is opened, the bot would mark the corresponding ticket as work-in-progress. And when the pull request is merged, the bot closes the ticket. And when a pull request merely changes the TODO: into a TODO(poorsoul): then it assigns the ticket to poorsoul.

Sounds pretty good, right? Turn those TODOs into tickets and never lose a TODO again.

The problem is: it’s not free. It looks like it is, because the code is quickly written and it runs as a GitHub action we don’t have to pay for, but it’s not.

It’s another process, another tool, another automated piece in our machinery. Another thing that needs to be fixed when it ultimately breaks down, another bit of automation that works 99% of the time, but starts making funny noises when you slip into the 1% and, say, moved a TODO down five lines by accident and don’t want the bot to close and re-open tickets, kicking off another wave of notifications.

That’s the actual cost of adding that bot.

The question is do we want to pay it? Does it help me ship? Does it help me ship more? Or does it help me ship faster, or with less friction, more safely?

If our imaginary codebase has more TODOs than test cases, for example, and these TODOs are holding us back from shipping because we can’t make a change without having to ask colleagues what this TODO we just discovered means, then it might be a good idea to add the bot. Even if we don’t intend to fix all of the TODOs, but only to finally get an overview and a peek at hidden part of the iceberg. It helps us ship.

If the code contains more than one TODO: make sure this works and we can’t ship because changing the code is playing a game of Russian roulette, where every change could kick off an avalanche of bugs, then yes, this bot would probably help us ship.

But what if we’re not held back by TODOs? What if we have a total of 18 of them, and 12 of those have been in the codebase longer than you and I have been at the company, and, generally speaking, our codebase is in an okay state — is the cost worth it?

If what’s holding you back from shipping is, say, getting more customer input, or a brittle release process, or flaky monitoring, or missing tests, then all the bot does is to add noise. It doesn’t help you ship.

What you think is bad about remote work, can, well, actually be good.

2020-05-22T09:30:00+00:00

I’ve been working remotely full-time at Sourcegraph for slightly over a year now and, in the five years before that, had 2-3 home office days a week at flinc and ioki.

There are a lot of different blog posts I could write about remote working: about its upsides and downsides, what works and doesn’t, when it makes sense and when not, what it requires and why I enjoy it.

But here I want to share my thoughts on a single, specific point that often comes up in discussions about remote work: the obvious downsides of remote work. Put in a sentence: “Of course remote work has advantages, but we all agree that it’s a trade-off; it’s better to have a chance to interact with real people, to have face-to-face time and to be able to quickly talk things through in person. Obviously.”

I’m here to tell you that these obvious downsides of remote work can be (and for me are) upsides.

Let’s start with social interactions. Having other people around, as you would in an office, is good. I agree. I like spending time with other people and some of my best friends started out as colleagues.

But here’s the rub: I’m also a sensitive person. That has its advantages (I’m good at “reading a room” and can empathize with others), but to me it can also mean that I’m easily and involuntarily influenced by other people’s mood.

Working remotely, social interactions lost a lot of their negative influence on me. In other words: the less I see my colleagues face-to-face, the less I worry about their face.

Less “oh, he didn’t seem enthused when I pitched them my idea”.

Less “they rolled their eyes in the company meeting when the CEO announced the new strategy, now I’m not so sure anymore about that strategy myself”.

Less “my manager sighed when I mentioned this problem I have. I’m sure it didn’t mean anything, but… maybe it did?”.

Less “I sent them a message to review my code 5 hours ago, I know they’ve checked their email, I saw it, so why didn’t they review my code?”

Less over-analyzing, less being influenced by things that range from “irrelevant to me” to “so random that it’s ridiculous I even think about it again”.

A different playing field

Here’s another angle: when you meet and discuss things in person — as opposed to in written, virtual form — it’s easy for the loudest person in the room to own the discussion.

I myself can be a pretty loud person (if I spot a chance to crack a joke, you can bet I’ll try to use it) and I have trouble not talking over other people, especially when I get excited about something. I try to work on it, but it’s hard to shut off what often feels like a reflex.

But when the main communication channels are asynchronous text and video calls (as would be the case in a remote work setup) the influence of the loudest person in the room wanes. It’s transferred to the best communicators.

Let me illustrate with an anecdote. When I joined Sourcegraph I spent one week in San Francisco for onboarding. Back then we still had an office and weren’t all-remote. But some of my colleagues were already working remotely and I didn’t meet them in that first week, only afterwards through Slack/GitHub/Zoom.

And you know what? I was highly impressed. Incredible technical knowledge, great writing, fantastic communication skills (proactive, mindful of the recipient, always providing enough context for the message to work asynchronously). All of that was clearly visible when I saw their messages and ideas on Slack, their code, their code reviews and when we jumped on calls to pair.

The twist is that when I finally met some of them in person I realized that they’re really shy and quiet and that if we were put together in a meeting room or an open office I never would’ve gotten the same impression I now had of them. I was actually happy that I met them online first.

Face-to-face meetings

We’ve all heard a variation of this: “You just can’t argue that things are much faster when you can have face-to-face meetings and get everyone around a table.”

A friend of mine said this a few years ago: “The Linux kernel is being developed by thousands of people, all over the globe, through email. Email! And you’re telling me we need to meet for two hours to decide when this button shows up or not?”

Two points here.

First, face-to-face meetings are not inherently better. They can be time wasters just like anything else. They can be inefficient without an agenda and clear goals, they can have the wrong people in them, they can end without any results, without notes, without something to show to others.

Second point: I’d argue that if you often need to get everyone in the same room to discuss and decide on something, you probably have too many people discussing and deciding things.

Why do five people have to sit around a table? Are all of them giving their input? If not and just one person is talking, couldn’t that have been an email? Or a video call where the other participants can just listen? Or just a document that took the writer slightly longer to prepare but the other participants less time to read than it would’ve taken them to attend the meeting?

At Sourcegraph I’ve learned what it means to truly work autonomously and the most important ingredients to that are trust and responsibility. What that enables is that you often don’t need five people in a room to make a decision. You need two, maybe three, and even then you often don’t need a meeting, since these two or three people are often on the same page anyway.

Don’t get me wrong, I don’t hate meetings. I actually roll my eyes when I hear things like “ugh, I wish I had no meetings at all and could just code.” Communication and coordination are important. But are face-to-face meetings really the most efficient way to achieve that? No, I don’t think so.

I think if it’s harder to have face-to-face meetings, as it is in a remote company, you start to work around them and can end up with something that has a lot of upsides: less people necessary to make decisions, more decisions being documented, better preparation, clear goals. More trust, more autonomy.

Surprise, surprise: there’s nuance to it

If there’s one overarching point to what I’m writing here, and I’m not sure there is, it could be this: there’s more nuance to all the obvious upsides and downsides of remote and in-office work than tweet-sized insights on the future of work in times of a global pandemic would make you think there are.

How much do we bend to the will of our tools?

2020-02-04T07:30:00+00:00

A few months ago, while looking at some code, a little light bulb that I didn’t even know existed went off in my head: “This was only written in this way, because the tools allow it to be written in this way.” Maybe it was a question mark, not a light bulb.

All of us agree, of course, that, yes, with a sufficiently generous definition of tool, the tools we use when programming influence the programs. Programming languages, type systems, testing frameworks, linters, etc. – they’re all tools, in one sense or another and they all leave their mark.

But that’s not what kept me staring. This was different, this code wasn’t just shaped by the language it’s written in, posture-corrected by a linter. No, this code was written by another type of tool.

There are tools that help you write better programs and then there are tools that help you better write programs: auto-formatters, auto-complete, jump-to-definition, documentation lookup, search. The latter is what engraved the code I was looking at.

And I freely admit, even though it might be shocking: I’m not a code savant, I can’t close my eyes, put my hand on a screen and whisper when code was written with which editor (I sincerely wish I could, but don’t tell my parents I said that). Yet I think it’s possible to spot an auto-formatter’s imprint.

Because when you look at the code you simply realize: there’s no other way. We programmers are too lazy. Only with these tools would we write a program in such shape and form.

Here’s a snippet that’s similar in its peculiarities to the one that got me here, take a look:

  const editableTitle =
  	inEditMode
  		?
  		<form
  			className='editing-form title-editing-form'
  			onSubmit={
  				async evt => {
  					evt.preventDefault();
  					try {
  						const txt = (evt.target as any).text.value;
  						await setTitle(txt);
  						setCurrentTitle(txt);
  					} finally {
  						setEditMode(false);
  					}
  				}
  			}
  		>
  			<textarea name='text' defaultValue={currentTitle}>textarea>
  			<div className='form-actions'>
  				<button className='secondary'
  					onClick={() => setEditMode(false)}>Cancelbutton>
  				<input type='submit' value='Update' />
  			div>
  		form>
  		:
  		<h2>
  			{currentTitle} (<a href={url}>#{number}a>)
  		h2>;

A ternary operator spanning 26 lines, in JSX, covering multiple inline functions, one of them using async/await and try/finally. There is a lot going on.

Now let me make it clear: this is not about this particular piece of code. And it’s not about JavaScript, TypeScript, React, TSX or JSX either. As far as I know most developers that work with these tools recommend against this style. You could replace the snippet with a lot of other code written in completely different languages. This particular piece is not even that bad.

It’s merely an example to illustrate my point: I bet you wouldn’t write your code like this if all you had was nano or Notepad.exe. Yes, I bet that long before you would indent a lone ? for 12, 14, 16 or 40 spaces inside another ternary operator, wrapping an inline function, you’d restructure your code.

“Yeah, and if I had to write it with pen and paper, I would’ve quit a long time ago, dude.” Of course. I hear you. And I don’t want to argue that we should go back to punch cards, but this code and all the tools involved in its creation made me wonder: what if the tools we use to write code make us so much better at writing code that we end up unable to work on it without the tools?

If you write text under a microscope, it’s going to end up so tiny that you would only be able to read it while looking through the microscope. What if these tools shape how we write code to such an extent that the code becomes illegible when we approach it without the tools in hand?

They make writing code so much easier by formatting it, moving it around, creating, suggesting and explaining it, but I wonder: do they also help us when we’re not writing new code? Because arguably the majority of our time working on software is not spent writing it: we’re reading code, trying to understand it, slightly tweaking and editing it.

Or did we end up with the programming version of the Omnipotence paradox, writing code that’s so hard to write that we ourselves cannot read it?

Or what if these writing tools only make writing a certain kind of code easier? It’s often said that the actual act of writing the code is the easiest part (“typing is not the bottleneck”) of the whole thing, as if it’s just the manual work, the typing it up, that comes after we made deliberate, concious decisions about a design and its trade-offs. But what if there is a feedback loop between our design choices and what our tools would make easy to type, biasing us against solutions that would require more manual typing?

In concrete terms: would our Java code look different if “Create new class” wasn’t bound to a keyboard shortcut, but instead we’d have a “Show me whether this function is pure or not” key (if such functionality were available)? Can we explain the differences in identifier length preferences between language communities by pointing to the availability of reliable auto-complete in one and lack thereof in another?

Or imagine a far more powerful tool chain than the one we have now, one that would allow us to run multiple analysis passes over our code while we’re still writing it: would we start to write longer functions if we had the ability to hide and show their sub-parts depending on the results of a data-flow analysis, revealing only the parts of the function that relate to the identifier under the cursor in the analysis?

How much of our design and architecture thinking is still bound by what’s easy to type? How much do we bend to the will of our tools? And, maybe most importantly, are we even aware of it?

Learn more programming languages, even if you won't use them

2019-04-09T08:30:00+00:00

This article has been translated into Spanish: Por qué debes aprender más lenguajes de programación (incluso si no los vas a utilizar)

Imagine we’ve been handed a task and we’re free to choose the programming language. The assignment involves all sorts of string manipulation: reading strings, splitting strings, trimming, joining and running regular expressions over strings, everything in UTF-8 and, of course, emojis need to work. Which language do we choose? C? Oh, please no.

Another job, this time at a financial institution. We need to do tens of thousands of concurrent calculations. High performance is a hard requirement. Should we use… Ruby? Come on. Next up: a one-off script that renames a bunch of files… written in Java? A web browser… in Python? Programming a controller for a medical device with… C#? Swift? Lua? You get the point.

Different programming languages are good at different things and bad at others. Each one makes certain things easier and in turn others harder. Depending on what we want to do we can save ourselves a lot of work by choosing the language that makes solving the type of problem we’re facing the easiest.

That’s one of the tangible, no-nonsense benefits of learning more languages. You put another tool in your toolbox and when the time comes you’re able to choose the best one. But I would go one step further.

I think it’s valuable to learn new programming languages even if — here it comes — you never take them out of the box.

Languages shape the way we think*, each in their own peculiar way. That’s true for programming languages as well. Each language contains a different mental model, a different perspective for thinking about computation and how to write programs.

Take SQL, for example, and how it shapes your thoughts about the flow and the form of data in your program. Now consider what that would look like in an imperative, object-oriented language like Java, or a functional language like Haskell. Or in C. Imagine what a multi-player game server looks like in Python, in Haskell, in Erlang; streaming and processing terabytes of data in C, in Go, in Clojure; a user interface in Tcl, in Lua, in JavaScript.

Every programming language is a lens through which we can look at the problem we’re trying to solve. Through some of them the problem appears convoluted, exhausting. Through others it doesn’t even look like a problem at all, it looks barely different from any other mundane thing one does in this language.

By learning a new language, even if it stays in your toolbox for all eternity, you gain a new perspective and a different way of thinking about problems. Once you’ve implemented a game server in Erlang, you’re going to see game servers in a different light. After you’ve processed data in a Lisp by thinking of the data as a series of lists that you can mold by sending it through a series of tiny functions that can be composed to form pipelines of functions, you’ll see shadows of this pattern appear everywhere. As soon as you’ve had your first real taste of memory management in C, you’ll start to appreciate what Python, Ruby and Go are doing for you — while seeing the cost of their labour. And if you ever built a UI in JavaScript with React.js, you know that you’re thinking about UI components shifted, in a fundamental way.

These new perspectives, these ideas and patterns — they linger, they stay with you, even if you end up in another language. And that is powerful enough to keep on learning new languages, because one of the best things that can happen to you when you’re trying to solve a problem is a change of perspective.

* This is known as linguistic relativity or the Sapir–Whorf hypothesis. In the context of this article I support the thesis in certain ways, but you should know that in the scientific community it’s validity is still very much open for debate. See this article for an introduction to the problems with the thesis.

The Tools I Use To Write Books

2018-09-04T17:30:00+00:00

This article has been translated into Russian: Полезные инструменты для написания книг Thank you Vlad!

In the beginning, there is always a single text file, nothing more. It’s called ideas.md or book.md. It contains a list of thoughts and ideas, an outline. Everything else grows from there. It only makes sense that we start by talking about files.

The Files

Both of my books, Writing An Interpreter In Go and Writing A Compiler In Go, are written in GitHub Flavored Markdown (GFM). One file per chapter and all files under version control using git.

I only use a basic set of Markdown features in my texts: headings, emphasis, lists, links, images, quotes. And fenced code blocks. This last one is the most important one to mention here, because every piece of code presented in the books is contained in the Markdown files in the form of fenced code blocks.

Yes, that has all the drawbacks you imagine it to have. While I have syntax highlighting for fenced code blocks, editing them is not as comfortable as if they were their own files. But, most importantly, the code is also duplicated: one version lives in a Markdown file and one (or more) lives in the code folder that comes with the book. If I want to update a snippet of code presented in the book, I have to manually update every copy of it. Yes, cumbersome.

But there is one undeniable advantage to this approach: it works and it works exactly like I want it to. There are quite a few tools out there to embed code in Markdown files but none of them allow me to present a change to a piece of code.

Since we – you, the reader, and me, the writer – work on a single codebase in both books, we often have to extend or modify existing code. To show these changes I comment out the already existing parts of a method and just show what’s been added or changed. Like this:

// compiler/compiler.go

func (c *Compiler) emit(op code.Opcode, operands ...int) int {
  // [...]

  pos := c.addInstruction(ins)
  return pos
}

I don’t know of an existing tool that can do that. They either embed portions of or a complete file. And, yes, that file could be a *.diff, but even that would have to be generated separately and beforehand. So I went with fenced code blocks.

And believe me, I was this close to writing my own tool. A preprocessor that would not only allow me to embed auto-generated diffs into Markdown but also to run commands on a set of changes and embed the generated output, too.

What kept me from doing that was a calm voice in my head telling me that I’m here to write a book, not a preprocessor. And since copying code into Markdown files is only cumbersome once you have to go back and edit the code, but actually quite comfortable while writing, I just kept on doing that, ignoring the other voices.

Now I have written two books and zero tools. I consider that a success.

The Pipeline

Of course I do not send plain text files out to readers. Instead, they receive nicely formatted PDF, ePub, Mobi and HTML files, which I create with only a tiny number of tools: pp, pandoc and KindleGen. Together they form a pipeline:

First, the Markdown files are piped through pp, a generic preprocessor for text files that can do a lot of things, but which I only use to replace two variables in the text: the URL of the zipped code folder readers can download and the current version of the book.

After that, the resulting Markdown is handed over to pandoc, the most important part of this pipeline.

Here’s the shortest possible description of what Pandoc does: it takes text in one format and outputs it in another format. Markdown goes in, HTML comes out. Or turn it around and put in HTML and get Markdown back. Or feed it Markdown and get DOCX, or ODT, or PDF, or AsciiDoc, or any other of the myriad of supported formats.

In my pipeline, Pandoc takes the Markdown files of the book and, with a little bit of YAML containing meta data, turns them into PDF, HTML and ePub files. The default output is already nice to look at, but I have a custom template for each of these three formats, all of which are based on Pandoc’s default templates.

Since the HTML output is a single file with CSS in the it’s easy to style. The same goes for ePub, which is really just a ZIP archive containing HTML files and is probably the one I styled the less, because I think it looks pretty good by default.

PDF generation, though, is done using LaTeX and requires a template written in LaTeX. I’ve stitched mine together from Pandoc’s default template and what Stack Overflow, hours of trial and error and the enlightenment and horror that was “holy shit, did you know LaTeX has its own package manager?” have given me. I like to touch it only when absolutely necessary. In the end, though, that doesn’t matter much.

What comes out looks beautiful to me and Pandoc is, without any doubt and exaggeration, one of the best tools I’ve ever used. It does exactly what it promises to, its documentation is stellar, it’s actively and carefully maintained and never once let me down. If I would have to shorten this post to one word, it would be “Pandoc”.

The only thing Pandoc can’t do is produce Mobi files, which is what Amazon uses for their Kindle eBook readers and store. For that, I use Amazon’s own command line tool KindleGen, which turns the ePub produced by Pandoc into a Mobi file. No styling or templates required.

Once the final files fall out of the pipeline I bundle them in a ZIP file, together with a folder that contains all of the code presented in the books. Ready to be published.

Publishing

I self-publish both books in both editions, eBook and paperback. Self-publishing means that instead of a publisher I have to take care of selling, printing and distributing the books to readers.

While I could theoretically run my own shop on which I sell the books, I don’t want to. I want to write books, not a web application for selling books, especially not one that involves the handling of taxes for an international audience. So instead I use two services to take care of that for me.

The first one is Gumroad, which I use to sell and distribute the eBook editions. I upload my ZIP, Gumroad accepts payment via PayPal or credit card and then sends the file to the reader – in exchange for a rather small fee. It also takes care of collecting taxes for me and I can set the price without any limitations, refund customers, send out free updates and create promo codes. After nearly two years, I’m still a happy customer and the only two features I’d love to have are more payment methods and pricing per country, so I can set a lower price for readers in India, for example.

The paperback editions are sold, printed on demand and shipped by Amazon Kindle Direct Publishing (KDP). I upload a print-ready cover and PDF version of a book and Amazon turns it into a paperback that you can purchase in seven different Amazon stores. Createspace is what I previously used for that, but after Amazon bought Createspace, they started to move the Createspace functionality over to KDP. By now, I’ve completely switched over and only use KDP. One less tool to worry about, since I was using KDP anyway to publish the Kindle version of the books on the Kindle stores.

For someone like me, a person who starts to sweat when we he hears “CMYK”, “RGB” and “you need to change your file” in one sentence, creating print-ready artifacts can be a bit of a hassle, but using LaTeX for the PDF generation comes in quite handy here. In a separate LaTeX template I use with Pandoc I can set the dimensions and margins of the document to exactly what I need and LaTeX takes care of the rest.

Readers can then purchase my books just like any other product on Amazon, including Prime shipping, refunds and all the payment methods accepted by Amazon. The downside of all this is a loss of control for me. I can’t, for example, offer personalized coupon codes nor can I bundle the paperback with the eBook edition.

I still think it’s worth it. When you upload a PDF file on Friday and then hold the paperback version of that file in your hands on Wednesday, you quickly forget about wrestling with color models of PDFs and start to grow convinced that we’re living in the future.

The Most Important Bit

That’s it. That’s the complete journey, from bytes in a text file to ink on paper or a ZIP in your inbox.

But here’s the most important bit, saved for last: none of this matters if you want to write a book. Quite a few people have told me that they want to write a book, but they’re not sure about which tools to use. My advice: all you need to write a book is a program that allows you to write text into a file.

Tools are only important to the process of writing a book in that they should get out of your way. You shouldn’t have to worry about how to put text in a file, only what text. Once you can do that comfortably – you know, with autosaving and the ability to edit effortlessly – keep on doing it. And then, keep doing it. Once you have something you’d be happy to publish, you can start to worry about tools.

The Paperback Edition of Writing A Compiler In Go

2018-08-14T17:30:00+00:00

Well, that certainly went quicker than I planned.

I knew from releasing the paperback edition of Writing An Interpreter In Go that a lot of people still prefer paper over eBooks. So it didn’t come as a big surprise when, right after the release of Writing A Compiler In Go, people started asking me about a paperback edition.

But I replied that before I start working on a paperback edition, I first need to take a break. I’ve worked on this book for close to a year and I wanted to sit back and take a big breath. I knew that I’ll eventually release a paperback, but that could wait a few weeks, or months even.

As it turns out, I’m pretty bad a taking big breaths when there’s something I can and want to do. So, here we are. Exactly two weeks after the release of the eBook, Writing A Compiler In Go is now available as a paperback:

It’s 18cm wide and 26cm long, exactly like its predecessor. They look good when put next to each other on a shelf. This one is thicker, though, with 338 pages — roughly 60 more than the first one.

The other notable change is that this book doesn’t have full-color but monochrome syntax highlighting. When I released the first paperback edition of Writing An Interpreter In Go I did not yet realize how expensive full-color printing really is. Now I do and know why nobody else does it, which is also why the current paperback edition of Writing An Interpreter In Go is black & white, too.

I’m pretty happy with how it turned out:

The Lost Chapter: A Macro System For Monkey

2017-06-28T17:30:00+00:00

If you don’t care about the Who, Where, When, Why, How and the Why Is It A Lost Chapter? and want to skip to the What: I wrote a new chapter for Writing An Interpreter In Go and you can read it for free at interpreterbook.com/lost. Otherwise, read on…

The pages you are about to read were found amidst the rubble of a collapsed ruin. Wedged between the scratched and battered cases of old machines once called “computers”. Bearing, in faint white and barely readable, the title “Writing An Interpreter In Go. Chapter 5: A Macro System For Monkey.” …

Alright, I’ll admit it: that was a lie. What I want to show you is not really a lost chapter, preserved through the eons, found in the ruins of a long-gone civilization. I just needed a good intro.

You see, I couldn’t sit still. In the first couple of months after publishing Writing An Interpreter In Go I took some time off from Monkey, the programming language we built in the book. “The book is done. Take a breath and play around with something else. After working on it for a year you deserve it”, I told myself, only to grow more anxious by the week about all the features, optimizations and tweaks I could try and add to Monkey. In the end, the temptation of everything Monkey could still be won. I gave in and restarted work on Monkey again.

This resulted in two things: a project I’m not ready to talk about yet and a new, additional chapter for Writing An Interpreter In Go called The Lost Chapter: A Macro System For Monkey, which I want to tell you all about.

It started with me getting sidetracked while working on said secret project by discovering how elegant and beautiful macros in Racket are. I guess, I just can’t stop myself from ushering an impressed “nice” when hearing about “code that writes code”. Next thing I know I was digging through various implementations of macros in different languages and getting more and more fascinated. It’s code that writes code! It’s a hand that draws itself! How could I not be fascinated by that?

A few “Huh, interesting…” followed by more “Well, I guess, it wouldn’t be too hard to just…” later I successfully added macros to Monkey. Macros that are able to modify and generate Monkey source code and are evaluated in their own macro expansion phase. A real, Lisp-style macro system. I was elated.

In fact, the whole journey from learning about how macros are implemented and why they’re so powerful to implementing them myself was so mind-blowing and fun that I had to write about it.

At first I thought I was writing a blog post or a tiny addition to the book and gave it the working title “The Lost Appendix”, thinking of a few pages hidden at the end of a book.

It ended up with the title The Lost Chapter: A Macro System For Monkey, because what we have here is not a small addition. It’s a complete chapter, close to 50 pages in PDF format, that shows you how to implement a fully-working macro system for Monkey - step by step, all code shown, fully tested, just like the book. You can think of it as the fifth chapter of Writing An Interpreter In Go, since it seamlessly continues the previous four. It’s just being delivered a few months later than the rest of the book.

But why “The Lost Chapter”? Because a text about macros deserves a touch of mystery, don’t you think? It’s code that writes code, come on! It’s snakes eating their own tail and surgeons operating on themselves! If that isn’t worthy of title that’s a little bit out there, I don’t know what is.

I also didn’t want to make it an addition to the book itself. On the practical side there’s the hurdle of extending a paperback edition by around 50 pages and not being able to send the update to readers who already bought the paperback. But then there were also, let’s say, “conceptual” considerations.

While I consider learning to build your own programming language a worthwhile endeavor that can teach you a lot of valuable things about programming, I’ll concede that it looks pretty disconnected from the realities of one’s day job. But adding a macro system? Writing code that lets you write code that writes code? That doesn’t just look unrealistic, but rather … Let me put it this way: totally and completely nuts and, oh, incredible fun!

I wanted this chapter to be exactly that: a fun addition to Writing An Interpreter In Go, not quite Monkey canon, but a bizarro expansion pack; a curious and accidental supernova in the same universe.

Oh, and did I mention it’s available for free? Well, it’s available for free. Read it online or download it as PDF/HTML/Mobi/ePub here:

interpreterbook.com/lost

The downloadable version also includes all the runnable, tested code shown in the chapter and the complete Monkey interpreter from Writing An Interpreter In Go.

I hope it’ll get you to usher a “nice”, too.

Writing An Interpreter In Go: The Paperback Edition

2017-02-22T17:45:00+00:00

If you’d asked me a only few months ago if there’ll ever be a printed version of Writing An Interpreter In Go I’d responded with a “Huh, uummm, well, I don’t know. Maybe. Maybe if I’ll find the time and if there’s any interest.”

As it turned out, to my surprise, quite a few people told me that they’d love hold a copy of the book in their hands. And I also had some free time on my hands. Alright, let’s do it then, I thought.

But even though I said that time and interest were the only limiting factors, I knew that there couldn’t be a printed version without Monkey - the programming language that we build in the book - having a logo. Yes, I know, I know, that’s not a real requirement, but a little indulgement I wouldn’t deny myself. So I created a 99designs contest and Hazel Anne submitted the winning entry. I love the logo Monkey has now.

A paperback version of a book also needs a full cover, front and back, and so I wrestled with vector images and PDFs and print dimensions and page bleed and spine widths for quite a while. But, in the end, using createspace to print and distribute my book turned out to be much easier than one might think. I was lucky enough to already have had a working Pandoc setup in place and only needed to add one more LaTeX template, the one for the print version.

The result, I think, was worth it:

That’s 260 pages, 18cm wide and 26cm long, with full-color syntax highlighting.

Since the book is printed on-demand by createspace, which is an Amazon company, it’s available for purchase in these Amazon stores:

Or you can just go to interpreterbook.com and click on one of the big, red buttons.

If you appreciate holding a physical copy of a book in your hands more than having a PDF on your hard drive, I hope you enjoy this paperback edition.

Higher Value Tools

2017-02-08T18:00:00+00:00

There are certain tools that provide incredibly high value. Much more so than others. They provide so much value by acting as a multiplier of power and leverage. And I think there’s something they all have in common.

I’m talking about interpreters, compilers and transpilers. Programming languages are the ultimate, universal tools and sit at the bottom of stack on which a bazillion other tools are built. Some programming languages offer so much power that their creation was the big bang for whole categories of other tools.

But I’m also talking about DSLs, code generators and templating engines. And databases with query languages. And database drivers that make these databases available to programming languages. jQuery and its $('exactly what I want') interface. jq and its query language. Webservers. Editors, IDEs, code analysers and generators.

It seems to me what they all have in common, what is close to their center of power, is parsing. Parsing user input, parsing source code, parsing query expressions, parsing configuration files, parsing network responses. Maybe it’s parsing itself what makes these tools so powerful. I’m not sure.

What I know and what I’m sure about is that without knowledge of parsing you won’t be able to build tools like these. Knowing how to write a parser is like a secret power and once you have it, you realize that you’re now able to solve a whole range of problems you haven’t even considered before. Now you can create higher value tools.

What I didn't do to write a book

2017-01-16T18:00:00+00:00

I wrote my book “Writing An Interpreter In Go” over the course of 11 months. The first four months were spent on building the Monkey programming language and its interpreter. In the following seven months I wrote the book itself and at times it felt like I’ll never finish. But I did and now I want to answer a question a few people have asked me: “How?”

What follows is much more of a confession than a precise description of a refined workflow or a secret productivity technique.

I didn’t have a TODO list I didn’t abandon after three weeks. Did I get things done? I did, but I never read Allen’s book. I also didn’t organize my time according to the four quadrants. I didn’t use a bullet journal to keep on top of ideas and tasks, didn’t use a pomodoro timer and didn’t keep a work journal. org-mode? I wish. Unplug, turn off notifications and just use pen and paper? That’s ridiculous, I have a keyboard.

Some tasks and ideas I put in Wunderlist, some in a Trello board and others in a file called “TODO.md”. Occasionally I even came back to each one and moved some things around.

Taking notes wasn’t much more organized. There’s a shell script I built. It’s based on the sound principles of popsicle-sticks-and-duct-tape-engineering and helped me to quickly create text files in a “notes” folder. Other times I used Notes.app. I also had iA Writer on my phone to access my Dropbox folder and directly write random ideas into the book. When I felt like it, I also did this on my computer: write ideas and outlines directly into the files that make up the book.

All of this changed from week to week and month to month. Sometimes from one day to the other.

The only constant in these 11 months was this: I was determined to finish the book, to keep chipping away at it until it’s done. I got up every day at 5:45am and tried to take another step forward, using whatever it takes.

But don’t take this for something it isn’t. It would simply be a lie to say that every morning I sat in front of my computer and got a solid hour of writing done before heading to work.

Sometimes I got up, drank two, three cups of coffee and just browsed the internet for an hour, breaking the chain. Other times I wrote for ten minutes at home and for 30 more on the train. On my best days, I wrote for an hour at home and for the whole train ride. On some days I only wrote down one sentence, more often than not starting with “FIXME:”.

Is there a moral to the story? I’m not sure, maybe it’s this one: productivity tools and techniques can only help, they won’t ever do the work for you.

It’s easy to fall into this trap and think that once the TODO lists are tidy and organized and the best notebook money can buy is sitting on the table, half of the work is already done. Of course, that’s not the case. Just like an expensive guitar won’t make you a great guitar player and the best running shoes won’t get you out of the door every day, productivity techniques won’t finish your project. They might help, but you have to put the work in. You have to keep showing up and keep chipping away at it. No tool will ever do that for you.

A Virtual Brainfuck Machine In Go

2017-01-04T17:00:00+00:00

You’re a programmer and your product manager walks up to your desk, taps you on the shoulder and asks if you have a couple of minutes to spare. She needs to talk to you about something. You sit down together and she has a serious look on her face. Oh boy. Something’s up. “Do you have anything important on your plate right now? I need you to do something for me.” Here it comes… “I need you to write a Brainfuck interpreter for me. A fast one.”

Some people might say that this conversation will never, ever happen. Well, “better be prepared” is what I say.

Brainfuck

Brainfuck is a weird looking programming language and keeps every promise its name makes. Here is “Hello, World!” in Brainfuck:

++++++++[>++++[>++>+++>+++>+<<
<<-]>+>+>->>+[<]<-]>>.>---.+++
++++..+++.>>.<-.<.+++.------.-
-------.>>+.>++.

If you’re now thinking “Heck, I’d use that in production”, let me tell you that Brainfuck was conceived as a fun, teaching language. Its inventor Urban Müller wanted Brainfuck to be a language that’s easily implementable and thus make it the perfect choice for someone who wants to learn more about interpreters or compilers.

I think, he reached that goal. Implementing Brainfuck is an eye-opening experience. Even though it’s a tiny language, it’s perfectly well-equipped to illustrate a number of concepts behind programming language implementations.

But before we can build Brainfuck, we need to understand how Brainfuck thinks.

Views Of The World

One thing in which programming languages differ is their model of the world and how they make it accessible to their users.

Take C, for example. Leaving aside the multitude of abstractions that hide in the depth of the kernel and the hardware, when working with C you can peek behind the curtain and see the inner workings of your computer. You are pretty close to the hardware-supported stack and you can allocate and free memory on the heap. If you’re experienced and stare intently enough, you can see the actual machine code when looking at your C code. The same goes for C++.

In Forth you mainly work with a stack. You push, you pop, you swap and drop. Nearly everything you do happens on a stack. In Forth, the stack is the world.

In other languages, these underlying assumptions about the mechanics of the world are abstracted away. Even though the current version of the Ruby Virtual Machine has a stack, you won’t notice. You don’t push and pop, but send messages to objects. The same goes for Java. You have classes that inherit from each other and memory allocation only concerns you in so far as the garbage collector shows up on time.

Then there are some languages that explicitly tell you what their world looks like. Especially intermediate languages, which are not meant to be written by hand, but are representation of end-user languages and easier for computers to understand and optimize. WebAssembly, for example, represents the commands of a stack-based machine, that gets then emulated by a runtime (which will be a browser, most of the time). Java bytecode is a representation of Java code in the world of a stack machine.

Brainfuck Machines

And then there’s Brainfuck. Brainfuck doesn’t just tell you what its view of the world is, no, it smacks you over the head with it.

Brainfuck is based on the assumption that Brainfuck code will be executed by a Brainfuck machine. Just like the PUSH and POP operations in Java bytecode assume that the JVM manages a stack, the + and - in Brainfuck assume that there’s a Brainfuck machine which supports these two instructions.

So what does this Brainfuck machine look like? Not too complicated! It only has a few parts:

Memory: The machine has 30000 memory cells, that can each hold an integer value from 0 to 255 and are initialized to 0 by default. Each cell is addressable by a zero based index, giving us a range of 0 to 29999 as possible indexes.
Data pointer: It “points” to a memory cell, by holding the value of the cell’s index. E.g.: if the value of the data pointer is 3, it points to the fourth memory cell.
Code: The program that’s executed by the machine. It’s made up of single instructions, which we’ll get to in a short while.
Instruction pointer: It points to the instruction in the code that’s to be executed next. E.g.: if the code is ++-++ and the instruction pointer has the value 2 then the next instruction to be executed is -.
Input and output streams: Just like STDIN and STDOUT in Unix systems, these are normally connected to the keyboard and the screen and are used for printing and reading characters.
CPU: It fetches the next instruction from the code and executes it, manipulating the data pointer, instruction pointer, a memory cell or the input/output streams accordingly.

That’s it. Those are all the parts of a complete, working Brainfuck machine that can execute Brainfuck code. So let’s take a closer look at Brainfuck code.

The Instructions

Brainfuck is tiny. It consists of eight different instructions. These instructions can be used to manipulate the state of the Brainfuck machine:

> - Increment the data pointer by 1.
< - Decrement the data pointer by 1.
+ - Increment the value in the current cell (the cell the data pointer is pointing to).
- - Decrement the value in the current cell.
. - Take the integer in the current cell, treat it as an ASCII char and print it on the output stream.
, - Read a character from the input stream, convert it to an integer and save it to the current cell.
[ - This always needs to come with a matching ]. If the current cell contains a zero, set the instruction pointer to the index of the instruction after the matching ].
] - If the current cell does not contain a zero, set the instruction pointer to the index of the instruction after the matching [.

That’s all of it, the complete Brainfuck language.

Even though these instructions look archaic, they’re just identifiers. Replace + with PLUS, - with SUB, . with PRINT and [ with LOOP and suddenly Brainfuck starts to look more like Brain-oh-wow-wait-a-second-I-can-actually-read-that.

Now that we know what the machine should look like and what it has to do, let’s get started with building it.

Building The Machine

The basic structure will be called - you guessed it - Machine and looks like this:

// machine.go

type Machine struct {
	code string
	ip   int

	memory [30000]int
	dp     int

	input  io.Reader
	output io.Writer
}

func NewMachine(code string, in io.Reader, out io.Writer) *Machine {
	return &Machine{
		code:    code,
		input:   in,
		output:  out,
	}
}

As you can see, everything we’ve talked about is here: the code, the instruction pointer (ip), the memory, the data pointer (dp) and both the input and output streams.

Now we just need a method that can start this Machine and get it to execute code:

// machine.go

func (m *Machine) Execute() {
	for m.ip < len(m.code) {
		ins := m.code[m.ip]

		switch ins {
		case '+':
			m.memory[m.dp]++
		case '-':
			m.memory[m.dp]--
		case '>':
			m.dp++
		case '<':
			m.dp--
		}

		m.ip++
	}
}

Here we step through every instruction in m.code until we reach its end. In order to execute each instruction individually, we have a switch statement, that “decodes” the current instruction and manipulates the machine according to which instruction it is.

In the case of + and - we manipulate the current memory cell, incrementing and decrementing its value respectively. The current memory cell is pointed to by the data pointer, m.dp, and we can get to it with m.memory[m.dp]. And in order to change the data pointer itself, we have two case branches for > and <.

So far, so good. But we’re missing printing and reading, the . and , instructions. In order to implement support for those, we need to make a slight modification: we need to give our Machine a one-byte buffer slice.

// machine.go

type Machine struct {
// [...]
	buf []byte
}

func NewMachine(code string, in io.Reader, out io.Writer) *Machine {
	return &Machine{
// [...]
		buf: make([]byte, 1),
	}
}

With that in place, we can add two new methods called readChar and putChar:

// machine.go

func (m *Machine) readChar() {
	n, err := m.input.Read(m.buf)
	if err != nil {
		panic(err)
	}
	if n != 1 {
		panic("wrong num bytes read")
	}

	m.memory[m.dp] = int(m.buf[0])
}

func (m *Machine) putChar() {
	m.buf[0] = byte(m.memory[m.dp])

	n, err := m.output.Write(m.buf)
	if err != nil {
		panic(err)
	}
	if n != 1 {
		panic("wrong num bytes written")
	}
}

readChar reads one byte from the input, which will be os.Stdin, and then transfers this byte to the current memory cell, m.memory[m.dp]. putChar does the opposite and writes the content of the current memory cell to the output stream, which will be os.Stdout.

It has to be said, that instead of doing proper error handling here, we just let the machine blow up by calling panic. That shouldn’t happen, of course, when we plan to use it in production (I dare you), so keep that in mind.

Using these two methods means adding new case branches to the switch statement in Execute:

// machine.go

func (m *Machine) Execute() {
	for m.ip < len(m.code) {

// [...]
		case ',':
			m.readChar()
		case '.':
			m.putChar()
// [...]

	}
}

And with that, our Brainfuck machine can read and print characters! It’s time to move on to the hairiest part of the implementation.

Looping

Brainfuck’s two control flow instructions are [ and ]. And they’re not quite like loops or other control flow mechanisms in “normal” languages. Expressed in some Go-like dialect of pseudo-code, what they do is this:

switch currentInstruction {
case '[':
  if currentMemoryCellValue() == 0 {
    positionOfMatchingBracket = findMatching("]")
    instructionPointer = positionOfMatchingBracket + 1
  }
case ']':
  if currentMemoryCellValue() != 0 {
    positionOfMatchingBracket = findMatching("[")
    instructionPointer = positionOfMatchingBracket + 1
  }
}

Note the two different conditions of the if-statements. They are the most important bits here, because they give both instructions separate meaning. Here’s an example to see how [ and ] can be used:

+++++   -- Increment current cell to 5
[       -- Execute the following code, if the current cell is not zero
->      -- Decrement current cell, move data pointer to next cell
+<      -- Increment current cell, move data pointer to previous cell
]       -- Repeat loop if current cell is non-zero

This snippet increments the current cell to 5 and then uses [ and ] to add the cell’s value to the next cell, by decrementing and incrementing both cells in a loop. The body of the loop will be executed 5 times until the first cell contains zero.

Of course, implementing the “does the current memory cell hold zero or not?” check is not the problem. Finding the matching brackets is what’s hairy about this, because brackets can be nested. It’s not enough to find the next ] when we encounter a [, no, we need to keep track of every pair of brackets we find. How are we going to do that? With a simple counter! Here is the pseudo-code from above turned into real Go code:

// machine.go

func (m *Machine) Execute() {
	for m.ip < len(m.code) {
		ins := m.code[m.ip]

		switch ins {
// [...]
		case '[':
			if m.memory[m.dp] == 0 {
				depth := 1
				for depth != 0 {
					m.ip++
					switch m.code[m.ip] {
					case '[':
						depth++
					case ']':
						depth--
					}
				}
			}
		case ']':
			if m.memory[m.dp] != 0 {
				depth := 1
				for depth != 0 {
					m.ip--
					switch m.code[m.ip] {
					case ']':
						depth++
					case '[':
						depth--
					}
				}
			}
		}

		m.ip++
	}
}

Let’s take a closer look at the case '[' branch.

Here we check whether the current memory cell’s value is zero and if it is, we try to set the instruction pointer, ip, to the position of the matching ]. In order to do that correctly in the face of nested bracket pairs, we use depth as a counter. With each [ we pass, we increment the counter, and with each ] we decrement it. Since it’s set to 1 initially, we know that we are sitting on our matching ] when depth is 0. And that means that m.ip is set to the correct position. The m.ip++ at the end of the for-loop does the rest and sets the instruction pointer to the instruction right after the matching bracket.

The case ']' branch is the mirrored version, where we walk backwards in the instructions, trying to find the matching [.

It’s time to flip the power switch on this machine.

Hello World

Here is a small driver, that reads in a file and passes it to our Brainfuck machine:

// main.go

package main

import (
	"fmt"
	"io/ioutil"
	"os"
)

func main() {
	fileName := os.Args[1]
	code, err := ioutil.ReadFile(fileName)
	if err != nil {
		fmt.Fprintf(os.Stderr, "error: %s\n", err)
		os.Exit(-1)
	}

	m := NewMachine(string(code), os.Stdin, os.Stdout)
	m.Execute()
}

That gives us the possibility to run Brainfuck programs on the command line:

$ cat ./hello_world.b
++++++++[>++++[>++>+++>+++>+<<
<<-]>+>+>->>+[<]<-]>>.>---.+++
++++..+++.>>.<-.<.+++.------.-
-------.>>+.>++.

$ go build -o machine && ./machine ./hello_world.b
Hello World!

It talks! Sweet! Our Brainfuck machine works!

So slow!

I have some good and some bad news. Our product manager said that the Brainfuck interpreter needs to be fast and, sadly, ours isn’t. That’s the bad news.

On my computer, our machine currently takes around 70 seconds to execute mandelbrot.b, a mandelbrot set fractal viewer written in Brainfuck by Erik Bosman, that’s often used as a benchmark for Brainfuck interpreters. That’s slow.

$ go build -o machine && time ./machine ./mandelbrot.b >/dev/null
 ./machine ./mandelbrot.b > /dev/null  68.24s user 0.18s system 99% cpu 1:08.60 total

The good news is, that there are a few things we can do to make it faster.

Take a look at the hello_world.b example from above or the mandelbrot.b program. See all those runs of + and -? There are a lot of instructions of the same type right behind each other in Brainfuck programs. And we have to read each one, check which one it is and then execute it.

The overhead of doing this is high. Consider this Brainfuck snippet: +++++. In order to execute it, we need five cycles of “fetch the next instruction”, “what instruction do we have here?” and “execute this!”. That turns into us incrementing the value of the current memory cell by one five times. It would give us a huge performance boost if we could just increase the current cell’s value by five directly.

The other thing that’s slowing us down is the way we handle [ and ]. Every time we stumble upon such a bracket, we go looking for its matching counterpart again. Scan the program, keep track of all the other brackets we pass and then modify the instruction pointer. The longer the program, the longer this will take. If we could do that just once for each bracket and remember the position of its matching counterpart, we wouldn’t need to rescan the program again and again.

And here’s the best of news: we can! We can do all of this before we even start up our Brainfuck machine. We can turn +++++ into something that says “increase by 5”. We can also do the same for -, >, <, ., and ,. And we can find and remember the positions of matching bracket pairs beforehand. All we need to do is create another representation of the original Brainfuck code that can include these optimizations and have our machine execute this instead.

A New Instruction Set

Up until now we’ve used a string to represent the code, that’s to be executed by the Machine. But in order to make optimizations, we need a new instruction set. Here is the Instruction type, that makes up the new set:

// instruction.go

type InsType byte

const (
	Plus          InsType = '+'
	Minus         InsType = '-'
	Right         InsType = '>'
	Left          InsType = '<'
	PutChar       InsType = '.'
	ReadChar      InsType = ','
	JumpIfZero    InsType = '['
	JumpIfNotZero InsType = ']'
)

type Instruction struct {
	Type     InsType
	Argument int
}

Each Instruction has a Type and an Argument. The Type can be one of the predefined constants defined at the top, where each constant has a corresponding Brainfuck instruction. The interesting part here is the Argument field. This field allows us to make our instruction set much more dense than the original Brainfuck code. We can put more information in less instructions. We’ll use Argument in two ways:

In the case of +, -, ., ,, >, and < the Argument field will contain the number of original Brainfuck instructions this Instruction represents. E.g.: +++++ will be turned into Instruction{Type: Plus, Argument: 5}
In the case of [ and ] the Argument field will contain the position of the instruction of the matching bracket. E.g.: the Brainfuck snippet [] will be turned into two Instructions: Instruction{Type: JumpIfZero, Argument: 1} and Instruction{Type: JumpIfNotZero, Argument: 0}.

Now that we have our new Instruction type and know how this new instruction set is to be interpreted, we can modify our Machine to do exactly that. The first thing we need to do is to change its definition, so it doesn’t work with a string anymore, but with a slice of *Instruction:

// machine.go

type Machine struct {
	code []*Instruction
	ip   int

	memory [30000]int
	dp     int

	input  io.Reader
	output io.Writer

	readBuf []byte
}

func NewMachine(instructions []*Instruction, in io.Reader, out io.Writer) *Machine {
	return &Machine{
		code:    instructions,
		input:   in,
		output:  out,
		readBuf: make([]byte, 1),
	}
}

With that change made, the Execute method of the Machine now also needs to work with this new type of instruction set:

// machine.go

func (m *Machine) Execute() {
	for m.ip < len(m.code) {
		ins := m.code[m.ip]

		switch ins.Type {
		case Plus:
			m.memory[m.dp] += ins.Argument
		case Minus:
			m.memory[m.dp] -= ins.Argument
		case Right:
			m.dp += ins.Argument
		case Left:
			m.dp -= ins.Argument
		case PutChar:
			for i := 0; i < ins.Argument; i++ {
				m.putChar()
			}
		case ReadChar:
			for i := 0; i < ins.Argument; i++ {
				m.readChar()
			}
		case JumpIfZero:
			if m.memory[m.dp] == 0 {
				m.ip = ins.Argument
				continue
			}
		case JumpIfNotZero:
			if m.memory[m.dp] != 0 {
				m.ip = ins.Argument
				continue
			}
		}

		m.ip++
	}
}

That’s a lot cleaner than what we had before, right? And it’s faster, too! Well, I can’t prove it yet, because there’s still a piece missing: something that turns Brainfuck code into a slice of *Instructions.

Compiling Brainfuck

Wikipedia defines a compiler as:

a computer program (or a set of programs) that transforms source code written in a programming language (the source language) into another computer language (the target language)

That’s exactly what we need! A program that takes Brainfuck code and turns it into our new “language”, which is made up of our Instructions.

And that’s also a pretty clear definition of requirements, which allows us to define our Compiler:

// compiler.go

type Compiler struct {
	code       string
	codeLength int
	position   int

	instructions []*Instruction
}

func NewCompiler(code string) *Compiler {
	return &Compiler{
		code:         code,
		codeLength:   len(code),
		instructions: []*Instruction{},
	}
}

The Compiler is constructed with the original Brainfuck code as a string and has an empty instructions slice that will be filled. That’s the job of the Compile method:

// compiler.go

func (c *Compiler) Compile() []*Instruction {
	for c.position < c.codeLength {
		current := c.code[c.position]

		switch current {
		case '+':
			c.CompileFoldableInstruction('+', Plus)
		case '-':
			c.CompileFoldableInstruction('-', Minus)
		case '<':
			c.CompileFoldableInstruction('<', Left)
		case '>':
			c.CompileFoldableInstruction('>', Right)
		case '.':
			c.CompileFoldableInstruction('.', PutChar)
		case ',':
			c.CompileFoldableInstruction(',', ReadChar)
		}

		c.position++
	}

	return c.instructions
}

That looks remarkably close to the Execute method of the current and previous versions of our Machine. But there’s a huge difference: whereas the Machine executed the Brainfuck instructions directly, our Compiler now turns them into *Instructions, so they can be executed later. Here is what the CompileFoldableInstruction method does:

// compiler.go

func (c *Compiler) CompileFoldableInstruction(char byte, insType InsType) {
	count := 1

	for c.position < c.codeLength-1 && c.code[c.position+1] == char {
		count++
		c.position++
	}

	c.EmitWithArg(insType, count)
}

func (c *Compiler) EmitWithArg(insType InsType, arg int) int {
	ins := &Instruction{Type: insType, Argument: arg}
	c.instructions = append(c.instructions, ins)
	return len(c.instructions) - 1
}

Together with EmitWithArg the CompileFoldableInstruction method scans through the input code (the Brainfuck string code) to see if the current instruction is followed by other instructions of the same type. If that’s the case, it folds those Brainfuck instructions into one Instruction.

EmitWithArg is a helper method that creates a new *Instruction, adds it to the c.instructions slice of the Compiler and returns the position of this newly created instruction in c.instructions.

Returning the position of the newest instruction is an important detail, because we’re going to need it now. As you may have noticed, we didn’t add support for [ and ] to our Compiler yet. That’s because these are not foldable instructions (e.g.: we cannot turn [[[ into a single instruction), but need to do something more elaborate.

Compiling Loops

We have two loop instructions: [ and ]. And we want to turn them into JumpIfZero and JumpIfNotZero instructions, where the Argument field contains the position of the matching bracket. That is: the position of the matching counterpart Instruction in the final instructions slice.

That’s easier said than done, though. The problem is that when we encounter a [ we don’t know where in the final instructions slice the matching ] instruction will end up. Counting the instructions in between doesn’t work, because it’s possible that those will be folded together in the next compilation step and thus invalidate the position we got through counting.

Then there’s also the problem of remembering the position of the last JumpIfZero instruction, so it can be used as Argument when constructing the matching JumpIfNotZero instruction.

But here’s what we’re going to do, here’s how we’re going to solve these problems. First, we will emit a JumpIfZero instruction for each [ we encounter, with the placeholder value 0 in the Argument field. Later, when we have constructed the matching JumpIfNotZero instruction, we’re going to come back to this instruction and change its Argument to the real value.

In order to later be able to change them, we need to keep track of JumpIfZero instructions. And we’re going to use a stack to do that, implemented with a simple Go slice:

// compiler.go

func (c *Compiler) Compile() []*Instruction {
	loopStack := []int{}

	for c.position < c.codeLength {
		current := c.code[c.position]

		switch current {
		case '[':
			insPos := c.EmitWithArg(JumpIfZero, 0)
			loopStack = append(loopStack, insPos)
// [...]
		}

		c.position++
	}

	return c.instructions
}

loopStack, which acts as a stack onto which we can push elements and later pop them off, is just an empty slice. There’s not much to it. Interesting here is the case branch for the [ instructions. Just like we discussed, we emit a new JumpIfZero instruction with a placeholder Argument. Then comes the important part: we push the position of the new JumpIfZero position onto our loopStack.

That, in turn, allows us to correctly handle ] instructions:

// compiler.go

func (c *Compiler) Compile() []*Instruction {
// [...]
	case ']':
		// Pop position of last JumpIfZero ("[") instruction off stack
		openInstruction := loopStack[len(loopStack)-1]
		loopStack = loopStack[:len(loopStack)-1]

		// Emit the new JumpIfNotZero ("]") instruction,
		// with correct position as argument
		closeInstructionPos := c.EmitWithArg(JumpIfNotZero, openInstruction)

		// Patch the old JumpIfZero ("[") instruction with new position
		c.instructions[openInstruction].Argument = closeInstructionPos
// [...]
}

We pop the position of the last JumpIfZero instruction, the opening [, which still holds a placeholder 0 as Argument, off the stack, and use it as the correct Argument for a new JumpIfNotZero instruction.

And since we now have the position of the JumpIfZero instruction, we can access it in c.instructions and change its Argument from 0 to the correct position of the new JumpIfNotZero instruction!

Isn’t that neat? Now our Compiler takes this piece of Brainfuck code

+++[---[+]>>>]<<<

And turns it into these Instructions:

[]*Instruction{
  &Instruction{Type: Plus, Argument: 3},
  &Instruction{Type: JumpIfZero, Argument: 7},
  &Instruction{Type: Minus, Argument: 3},
  &Instruction{Type: JumpIfZero, Argument: 5},
  &Instruction{Type: Plus, Argument: 1},
  &Instruction{Type: JumpIfNotZero, Argument: 3},
  &Instruction{Type: Right, Argument: 3},
  &Instruction{Type: JumpIfNotZero, Argument: 1},
  &Instruction{Type: Left, Argument: 3},
}

All that’s left to do now is making use of it.

Starting The Faster Machine

In order to make use of our optimized Machine and our shiny new instruction set, we have to use our Compiler when we read in a file of Brainfuck code:

// main.go

package main

import (
	"fmt"
	"io/ioutil"
	"os"
)

func main() {
	fileName := os.Args[1]
	code, err := ioutil.ReadFile(fileName)
	if err != nil {
		fmt.Fprintf(os.Stderr, "error: %s\n", err)
		os.Exit(-1)
	}

	compiler := NewCompiler(string(code))
	instructions := compiler.Compile()

	m := NewMachine(instructions, os.Stdin, os.Stdout)
	m.Execute()
}

That’s looks a lot like our old driver. But instead of reading in a file and passing its content to our Brainfuck machine, we first compile the original Brainfuck code in the file to our new Instruction set. And these Instructions will then be executed by our Machine.

If we now run this with the mandelbrot.b benchmark we can see that our work paid off: what took 70s before now only takes 13s!

$ go build -o machine && time ./machine ./mandelbrot.b >/dev/null
./machine ./mandelbrot.b > /dev/null 13.43s user 0.04s system 99% cpu 13.496 total

Isn’t that something?

Taking A Closer Look

Yes, we’ve only implemented Brainfuck, a language with no syntax to speak of and only eight different instructions. You might be tempted to call our two Brainfuck machines toys. But let’s take a look at what we actually did.

The first thing we built is an interpreter that acts as a Brainfuck machine. It has all the necessary parts: memory cells, data and instruction pointers, input and output streams. The interpreter effectively tokenizes its input by processing it byte by byte. It then evaluates each token on the fly. It’s not much longer than 100 lines, but has all the essential parts of a fully-grown interpreter.

And then we’ve built a compiler! Sure, it doesn’t output native machine code and it’s really simple, but it’s a compiler nonetheless! It takes Brainfuck code as input and outputs instructions for a machine - our Brainfuck machine. That’s the basic idea behind compilers. We could also change the way our Instructions are stored and passed around, and then we’d realize that our Machine is now a virtual machine and is executing bytecode.

Now, that doesn’t sound like toys, does it? What we built is using the same blueprints a lot of other, mature and production-ready programming languages use. Once you’ve understood how and why they work, you start to recognize them in other languages, too, and in turn understand these languages better.

And that’s why I think implementing Brainfuck can be a rewarding and eye-opening experience.

You can find the complete code, including tests, for both versions of the Brainfuck machine here on GitHub.

Why I Wrote a Book About Interpreters

2016-11-30T17:15:00+00:00

Last week I’ve self-published my first book called “Writing An Interpreter In Go”, which you can get at interpreterbook.com. I want to tell you a little bit about why I chose to write this particular book.

Sometimes I jokingly call the summer of 2015 my “Summer Of Lisp”. But, honestly, I’m only half joking when I say this. It really was a great and Lispy summer programming-wise: I was working through the final chapters of Structure And Interpretation Of Computer Programs (SICP), which I began studying at the beginning of that year, was totally fascinated by Lisp, enamored by Scheme and also starting to learn Clojure by working through the fantastic The Joy Of Clojure.

SICP had an immense impact on me. It’s a wonderful book, full of elegant code and ideas; it hearkens “to a programming life that if true, would be an absolute blast to live in”. Especially the fourth chapter made a lasting impression. In this chapter, Abelson and Sussmann show the reader how to implement the so called “meta-circular evaluator” - a Lisp interpreter in Lisp. “Mesmerized” is probably the word I’d use to describe myself while reading this chapter.

The code for the meta-circular evaluator is elegant and simple. Around 400 lines of Scheme, stripped down to the essentials and doing exactly what they are supposed to. It’s a beautiful piece of software. I asked a friend to design a poster for me, containing only the source code for the meta-circular interpreter, beautifully formatted. That poster hung next to my office desk for over a year.

But soon I discovered why it’s only 400 lines. The code presented in the book skips the implementation of an entire component - the parser. Huh. But how does a parser work then? I was stumped. I really wanted to know how that parser works. And I almost never want to skip anything I don’t know yet. I really want to know how things work, at least in a rough sense. Black boxes and skipping things always leave me wanting to dig deeper.

In that same summer I also read Steve Yegge’s “Rich Programmer Food”, in which he argues what a worthwhile goal it is to learn about and to understand compilers. Let me quote my favorite passage:

That’s why you need to learn how [compilers] work. That’s why you, yes you personally, need to write one.

[…]

You’ll be able to fix that dang syntax highlighting.

You’ll be able to write that doc extractor.

You’ll be able to fix the broken indentation in Eclipse.

You won’t have to wait for your tools to catch up.

You might even stop bragging about how smart your tools are, how amazing it is that they can understand your code […]

You’ll be able to jump in and help fix all those problems with your favorite language.

That blog post flipped a switch. Determined as if there was some kind of weird challenge I said to a friend of mine: “I’m going to write a compiler”. I believe, I was gazing into the distance while saying this. “Alright”, he said rather unimpressed, “do it.”

Without having taken a compiler course in college or even having a computer science degree I set out to write a compiler. The first goal, I determined, is to get a foot in the door and write an interpreter. Interpreters are closely related to compilers, but easier to understand and to build for beginners. But most importantly, this time there would be no skipping of anything. This interpreter will be built from scratch!

What I found was that a lot of resources for interpreters or compilers are either incredibly heavy on theory or barely scratching the surface. It’s either the dragon book or a blog post about a 50 line Lisp interpreter. The complete theory with code in the appendix or an introduction and overview with black boxes.

Every piece of writing helped though. Slowly but surely I was completing work on my interpreter. The tiny tutorials, the slightly longer blog posts and the heavy compiler books - I could find something useful in all of them.

Nevertheless I was getting frustrated. There needs to be a book, that … One day, I said to the same friend, who earlier so enthusiastically encouraged me to write a compiler:

“You know what… I’d love to write a book about interpreters. A book that shows you everything you need to know to build an interpreter from scratch, including your own lexer, your own parser and your own evaluation step. No skipping of anything!”

Somehow this turned into me giving myself a motivational speech.

“And with tests too!”, I continued, “Yeah! Code and tests front and center! Not like in these other books, where the code is an unreadable mess that you can’t get to compile or run on your system. And you don’t need to be well versed in mathematical notation either! It should be a book any programmer can read and understand.”

It’s entirely possible that I was banging my fist on the table at this point. Calmly, my friend said: “Sounds like a good idea. Do it.”

And here we are, 11 month later, and “Writing An Interpreter In Go” is available to the public. It has around 200 pages and presents the complete and working interpreter for the Monkey programming language, including the lexer, the parser, the evaluator and also including tests. No black boxes, no 3rd party tools and no skipping of anything. Nearly every page contains a piece of code. I’m really proud of this book.

Putting Eval In Go

2016-11-16T17:00:00+00:00

Over the past year I’ve spent a significant amount of time reading through Go’s go packages, the packages used by the Go compiler and other Go tools. But only recently did it occur to me that these are real, public packages. I can actually import and use them! So then I started to wonder what I could do with them when it suddenly struck me: “I can… I can put Eval in Go! Using Go!”

Let me explain. There’s the scanner package, which contains the lexer (or scanner, or tokenizer, …) that turns Go source code into tokens. These tokens are defined in their own package, token. And then there’s the parser, which takes the tokens and builds an AST. The definitions of the AST nodes can be found in the perfectly named AST package. And then there’s also a printer package to print these AST nodes.

In other words: we have all the necessary pieces here to build an Eval function that evaluates Go code. In fact, with these packages we could build a complete Go interpreter in Go. If you’re really interested in doing that, check out the go-interpreter project, which aims to do just that. Instead, let’s start small and write an Eval function that evaluates mathematical Go expressions.

The first thing we need is a driver, a REPL:

package main

import (
	"bufio"
	"fmt"
	"os"
)

const PROMPT = "go>> "

func main() {
	scanner := bufio.NewScanner(os.Stdin)

	for {
		fmt.Printf(PROMPT)
		scanned := scanner.Scan()
		if !scanned {
			return
		}

		line := scanner.Text()
		fmt.Println(line)
	}
}

This allows us to input Go expressions and have them printed back to us:

% go run eval.go
go>> 1 * 2 * 3 * 4
1 * 2 * 3 * 4
go>> 8 / 2 + 3 - 1
8 / 2 + 3 - 1
go>>

So far, so dull.

The next step would be to initialize Go’s scanner with these input lines and turn them into tokens. Luckily, the parser package has a ParseExpr function that does exactly that. It initializes the scanner and reads in the tokens for us. It then parses the tokens and builds an AST. We can use it to parse the input in our REPL:

package main

import (
	"bufio"
	"fmt"
	"go/parser"
	"os"
)

const PROMPT = "go>> "

func main() {
	scanner := bufio.NewScanner(os.Stdin)

	for {
		fmt.Printf(PROMPT)
		scanned := scanner.Scan()
		if !scanned {
			return
		}

		line := scanner.Text()
		exp, err := parser.ParseExpr(line)
		if err != nil {
			fmt.Printf("parsing failed: %s\n", err)
			return
		}
	}
}

The result of our call to ParseExpr, exp, is an AST that represents the entered Go expression, without such details as comments, whitespace or semicolons. We can use the printer package to print it. We just have to use token.NewFileSet() to make the printer believe that we got our Go source code from a file:

import (
	"bufio"
	"fmt"
	"go/parser"
	"go/printer"
	"go/token"
	"os"
)

func main() {
// [...]

	for {
// [...]

		exp, err := parser.ParseExpr(line)
		if err != nil {
			fmt.Printf("parsing failed: %s\n", err)
			return
		}

		printer.Fprint(os.Stdout, token.NewFileSet(), exp)
		fmt.Printf("\n")
	}
}

Now would you look at that:

% go run eval.go
go>> 1 * 2 * 3 * 4
1 * 2 * 3 * 4
go>> 5 * 6 * 7 * 8
5 * 6 * 7 * 8

Okay, yes, you’re right. That looks exactly like our “printing back the input” mechanism we had before. But there’s more to it. What we’re actually doing here is parsing the input and pretty-printing the AST produced by the parser. See for yourself:

% go run eval.go
go>> 1           * 2           *       3 * (((5 + 6)))
1 * 2 * 3 * (5 + 6)
go>>

The whitespace has been removed, just like the superfluous parentheses around the last sub-expression. We’ve built our own crude version of gofmt in around 35 lines of Go code:

% go run eval.go
go>> func (name   string) { return name }
func(name string) {
        return name
}
go>>

But we want more than just pretty-printing the AST. We want an Eval function that evaluates mathematical Go expressions. What Eval has to do is to traverse each node in the AST and evaluate it. Granted, this definition is kinda recursive, but that’s perfect, because Eval itself is a recursive function:

import (
	"bufio"
	"fmt"
	"go/ast"
	"go/parser"
	"go/token"
	"os"
	"strconv"
)

func Eval(exp ast.Expr) int {
	switch exp := exp.(type) {
	case *ast.BinaryExpr:
		return EvalBinaryExpr(exp)
	case *ast.BasicLit:
		switch exp.Kind {
		case token.INT:
			i, _ := strconv.Atoi(exp.Value)
			return i
		}
	}

	return 0
}

func EvalBinaryExpr(exp *ast.BinaryExpr) int {
	left := Eval(exp.X)
	right := Eval(exp.Y)

	switch exp.Op {
	case token.ADD:
		return left + right
	case token.SUB:
		return left - right
	case token.MUL:
		return left * right
	case token.QUO:
		return left / right
	}

	return 0
}

As you can see, Eval takes an ast.Expr as argument, which is what we get back from parser.ParseExpr. It then traverses this part of the AST but only stops at *ast.BinaryExpr and *ast.BasicLit nodes. The former is an AST node that represents binary expressions (expressions with one operator and two operands) and the latter represents literals, like the integer literals we used in our REPL.

What Eval has to do in the case of an integer literal is easy. Integer literals evaluate to themselves. If I type 5 into the REPL then 5 is what should come out. Eval only needs to convert the parsed integer literal to a Go int and return it.

The case of *ast.BinaryExpr is more complex. Here Eval has to call itself two times to evaluate the operands of the binary expression. Each operand can be another binary expression or an integer literal. And in order to evaluate the current expression, both operands need to be fully evaluated. Only then, depending on the operator of the expression, is the correct evaluating result returned.

All that’s left for us now is to use Eval in our REPL:

func main() {
// [...]

	for {
// [...]

		exp, err := parser.ParseExpr(line)
		if err != nil {
			fmt.Printf("parsing failed: %s\n", err)
			return
		}

		fmt.Printf("%d\n", Eval(exp))
	}
}

Now our REPL can do this:

% go run eval.go
go>> 1 + 2 * 3 + 4 * 5
27
go>> 1000 - 500 - 250 - 125 - 75 - 25
25

We’ve successfully put a working Eval function in Go! And it only took us around 70 lines of code, because we used Go’s internal compiler tools.

Write Stupid Code

2015-10-22T17:45:00+00:00

This post has been translated to Chinese.

In the last couple of months I developed a certain approach to writing code. Whenever I write a new function, class or method I ask myself: “Is this code stupid enough?” If it’s not, it’s not done and I try to make it stupid.

Now, stupid code does not mean “code that doesn’t work”. Stupid code should work exactly like it’s supposed to, but in the most simple, straightforward, “stupid” way possible.

Anyone could write it and anyone reading it should be able to understand it. It shouldn’t make the reader think about the code itself, but about the problem at hand. It shouldn’t be long, it shouldn’t be complex and, most importantly, it shouldn’t try to be clever. It should get the job done and nothing more.

What does stupid code look like? It depends on the problem it’s trying to solve. Take meta-programming, for example, which is often considered complex and “black magic”. Does asking myself “is this code stupid enough?” mean “no meta-programming allowed”? Not necessarily, no. There are certain cases, in which the problem can be solved in the simplest way through meta-programming. But there are a lot more cases in which meta-programming is unnecessary and additional baggage on top of the solution, which gets in the way of understanding what the code is supposed to do.

The goal is to get rid of the baggage, to chip away at it until the most stupid, still working, tests-passing code emerges.

Keep in mind the “stupid” here: “it works” is not good enough. A lot of complex, “look at this clever trick”, overly-abstracted, unreadable code works and makes the tests pass. That’s not what I’m after. It has to be stupid: not clever, not complex, not hard to understand.

Besides “stupid” the resulting code might also be described as “elegant”, “clean” and “simple”. But the “write stupid code” mantra is not as elusive as “write elegant code”, for example, and seems far more achievable, which makes the approach much more valuable to me. And besides that: I find it much more likely to start out with “write stupid code” and end up with an elegant solution than the other way around.

Not every elegant solution is straightforward, but “stupid” ones are, per definition, and can also be elegant.

Unicorn Unix Magic Tricks

2014-11-20T17:45:00+00:00

This post is based on the talk of the same name I gave at the Arrrrcamp conference in Ghent, Belgium on October 2nd, 2014. You can find the slides here and the video recording here.

Unicorn is a webserver written in Ruby for Rails and Rack applications. When I first used it I was amazed. This is magic, I thought. It had to be. Why?

Well, first of all: the master-worker architecture. Unicorn uses one master process to manage a lot of worker processes. When you tell Unicorn to use 16 worker processes it does so, just like that. And now you’re looking at 17 processes when you run ps aux | grep unicorn — each with a different name, showing whether its the master process or one of the worker processes, which even have their own number in their process names.

$ pstree | grep unicorn
 \-+= 27185 mrnugget unicorn master -c simple_unicorn_config.rb -l0.0.0.0:8080
   |--- 27210 mrnugget unicorn worker[0] -c simple_unicorn_config.rb -l0.0.0.0:8080
   |--- 27211 mrnugget unicorn worker[1] -c simple_unicorn_config.rb -l0.0.0.0:8080
   |--- 27212 mrnugget unicorn worker[2] -c simple_unicorn_config.rb -l0.0.0.0:8080
   |--- 27213 mrnugget unicorn worker[3] -c simple_unicorn_config.rb -l0.0.0.0:8080
   |--- 27214 mrnugget unicorn worker[4] -c simple_unicorn_config.rb -l0.0.0.0:8080
   |--- 27215 mrnugget unicorn worker[5] -c simple_unicorn_config.rb -l0.0.0.0:8080
   |--- 27216 mrnugget unicorn worker[6] -c simple_unicorn_config.rb -l0.0.0.0:8080
   |--- 27217 mrnugget unicorn worker[7] -c simple_unicorn_config.rb -l0.0.0.0:8080
   |--- 27218 mrnugget unicorn worker[8] -c simple_unicorn_config.rb -l0.0.0.0:8080
   |--- 27219 mrnugget unicorn worker[9] -c simple_unicorn_config.rb -l0.0.0.0:8080
   |--- 27220 mrnugget unicorn worker[10] -c simple_unicorn_config.rb -l0.0.0.0:8080
   |--- 27221 mrnugget unicorn worker[11] -c simple_unicorn_config.rb -l0.0.0.0:8080
   |--- 27222 mrnugget unicorn worker[12] -c simple_unicorn_config.rb -l0.0.0.0:8080
   |--- 27223 mrnugget unicorn worker[13] -c simple_unicorn_config.rb -l0.0.0.0:8080
   |--- 27224 mrnugget unicorn worker[14] -c simple_unicorn_config.rb -l0.0.0.0:8080
   \--- 27225 mrnugget unicorn worker[15] -c simple_unicorn_config.rb -l0.0.0.0:8080

How would one build something like this? I had no idea.

And then there’s a feature called “hot reload”, which means that you can tell Unicorn, while it’s running, to spin up a new version of your application. As soon as you do, Unicorn starts a new master process, which is going to serve the new version of your application. All the while the old master process is still running, responding to requests with your old application. Of course, the old master now has “old” in its name. Now, as soon as the new master process is fully booted up, you can send a QUIT signal to the old master process, which will in turn shut down and let the new one take over. And just like that you’ve switched to a new version of your application — without any downtime at all.

Oh, and Unicorn uses a lot more than the QUIT signal! There are tons of signals you can send to it: TTIN to increase the number of workers, TTOU to decrease it, USR1 to rotate the log files, USR2 to perform hot reloading, HUP to re-evaluate the configuration file. I didn’t know half of these signal names and there were even more in Unicorn’s own SIGNALS file.

And then there’s “preloading”: a feature of Unicorn that allows you to spin up new worker processes in less than a second, a fraction of the time it takes to boot up my Rails application. Somehow Unicorn is able to preload my application in memory and make use of that when creating new worker processes. And I had no idea how that works! Not a clue! And as if that wasn’t enough I discovered that Unicorn even has a file called PHILOSOPHY in its repository. Who else has that?! I was sure that there was some black magic going on. Because: how could Unicorn work like it does without magic?

Unix

After my first encounter with Unicorn I learned quite a bit about Unix systems and after a while I came back to Unicorn — still in amazement. But this time I read through the source code and it turns out, that, well, the secret ingredient to Unicorn is not magic but plain, old Unix.

Now, most people know Unix from a “user’s perspective”: the command line, shells, pipes, redirection, the kill command, scripting, text files and so on. But there’s this whole other side of Unix, too, which we could call the “developer’s perspective” now. From this side of Unix you can see signal handling, inter-process communication, usage of pipes without the |-character, system calls and whole lot more.

In what follows we’re going to have a look at Unicorn. We’ll take it apart and see that it’s just using some basic Unix tricks, the ones you can use as a developer, to do its work. The way we’re going to do that is by going through some of these Unix tricks, basic building blocks of every Unix system, and see how they work and how Unicorn uses them.

At the end we’ll go back to the “magic” of the beginning: hot reload, preloading, master-worker architecture. And we will see how these features work and how they are just Unix and not magic.

So let’s get started.

fork(2)

fork is how processes are created. Every process after the first one (with PID 1) was created with fork. So what is it, what is fork?

fork is a system call. Most of the time we can recognize system calls by the 2 behind their name (e.g. fork(2)) which means that we can find documentation about them in section 2 of the Unix manual, nowadays known as “man pages”. So in order to see the documentation for fork(2) you can run man 2 fork on your command line.

But what’s a system call? A way to communicate with the kernel of our operating system. System calls are the API of the kernel, if you will. We tell the kernel to do something for our us with system calls: reading, writing, allocating memory, networking, device management.

And fork is the system call that tells the kernel to create a new process. When one process asks the kernel for a new process with fork(2) the kernel splits the process making the call into two. That’s probably where the name comes from: calling fork(2) is a “fork in the road” in the lifetime of a process. As soon as the kernel returns control to the process after handling the system call there now is a parent process and a child process. A parent can have a lot of child processes, but a child process only one parent process.

And both processes, parent and child, are pretty much the same, right after the creation of the child. That’s because child processes in a Unix system inherit a lot of stuff from their parent processes: the data (the code it’s executing), the stack, the heap, the user id, the working directory, open file descriptors, the connected terminal and a lot more. This can be a burden (which is why copy-on-write is a thing) but also has some neat advantages — as we’ll see later.

So how do we use fork? Since (deep down) making a system call involves putting parameters and the unique identifier of the call in CPU registers (which ones may change depending on the architecture we’re working with) and firing a software interrupt, most programming languages provide wrappers that do all the work and allow us to not worry about which system call is identified by which number.

Ruby is no exception here and allows us to use fork(2) with a method called, well, fork:

# fork.rb

child_pid = fork do
  puts "[child] child_pid: #{child_pid}"
  puts "[child] Process ID: #{Process.pid}"
  puts "[child] Parent Process ID: #{Process.ppid}"
end

Process.wait(child_pid)

puts "[parent] child_pid: #{child_pid}"
puts "[parent] Process ID: #{Process.pid}"

What we’re doing here is calling fork in Ruby and pass it a block. This will create a new process, a child process, and run everything inside the block in the new process and then exit. In the parent process we call Process.wait and pass it the return value of fork, which is the ID of the child process. We also need to wait for child processes to exit because otherwise they’d turn into zombie processes. Yep, that’s a valid Unix rule right there: parent processes need to wait for their children to die so they don’t turn into zombies.

When we run this we’ll get this:

$ ruby fork.rb
[child] child_pid:
[child] Process ID: 29715
[child] Parent Process ID: 29695
[parent] child_pid: 29715
[parent] Process ID: 29695

As we can see, the child process has a new process ID and its parent process ID matches the process ID printed in the parent process. And most interestingly child_pid is nil inside the child process but contains a value in the parent process. This is how we can check whether we are in the parent process or the child process. Since the child inherits the data from the parent process, both processes are running the same code right after fork and we can decide which process does what depending on the return value of fork.

If we put a sleep somewhere inside the block, run it again and use a tool like ps or pstree we’d see something like this:

$ pstree | grep fork
 |   \-+= 29695 mrnugget ruby fork.rb
 |     \--- 29715 mrnugget ruby fork.rb

Two processes, one parent and one child, with different process IDs. Just by calling fork. That’s not too hard right? And it’s certainly not magic. So how does Unicorn use fork?

Unicorn and fork(2)

When Unicorn boots up it calls the spawn_missing_workers method, which contains this piece of code:

worker_nr = -1
until (worker_nr += 1) == @worker_processes
  WORKERS.value?(worker_nr) and next
  worker = Worker.new(worker_nr)
  before_fork.call(self, worker)
  if pid = fork
    WORKERS[pid] = worker
    worker.atfork_parent
  else
    after_fork_internal
    worker_loop(worker)
    exit
  end
end

So, what happens here? Unicorn calls this method with @worker_processes set to the number of workers we told it to boot up. It then goes into a loop and calls fork that many times. But instead of passing a block to fork, Unicorn instead checks the return value of fork so see if its now executing in the parent and in the child process. Remember: a forked process inherits the data of the parent process! A child process executes the same code as the parent, and we have to check for that in order to have the child do something else.

Passing a block to fork does the same thing under the hood, but explicitly checking the return-value of fork is quite a common idiom in many Unix programs, since the C API doesn’t allow passing blocks around.

If fork returned in the parent process, Unicorn saves the newly created worker object with PID of the newly created child process in the WORKERS hash constant, calls a callback and starts the loop again.

In the child process another callback is called and then the child goes into its main loop, the worker_loop. If the worker loop should somehow return the child process exits and is done.

And boom! We’ve now got 16 worker processes humming along, waiting for work in their worker_loop, just by going into a loop, doing some cleanup and calling fork 16 times.

That’s not too hard, is it? So let’s go from fork to another basic Unix feature…

Pipes!

My guess is that most people even vaguely familiar with Unix systems know about pipes and have probably done something like this at one point or another in their lives:

$ grep ‘wat’ journal.txt | wc -l
84

Pipes are amazing. Pipes are a really simple abstraction that allows us to take the output of one program and pass it as input to another program. Everybody loves pipes and I personally think the pipe character is one of the most best features Unix shells have to offer.

But did you know that you can use pipes outside of the shell?

pipe(2)

pipe(2) is a system call with which we can ask the kernel to create a pipe for us. This is exactly what shells are using. And we can use it too, without a shell!

Remember the saying that under Unix “everything is a file”? Well, pipes are files too. One pipe is nothing more than two file descriptors. A file descriptor is a number that points to an entry in the file table maintained by the kernel for each running process. In the case of pipes the two file table entries do not point to files on a disk, but rather to a memory buffer to which you can write and from which you can read with both ends of the pipe.

One of the file descriptors returned by pipe(2) is the read-end and the other one is the write-end. That’s because pipes are half duplex – the data only flows in one direction.

Outside of the shell pipes are heavily used for inter-process communication. One process writes to one end, and another process reads from the other end. How? Remember that a child process inherits a lot of stuff from its parent process? That includes file descriptors! And since pipes are just file descriptors, child processes inherit them. If we open a pipe with pipe(2) in a parent process and then call fork(2), both the parent and the child process have access to the same file descriptors of the pipe.

# pipe.rb

read_end, write_end = IO.pipe

fork do
  read_end.close

  write_end.write('Hello from your child!')
  write_end.close
end

write_end.close

Process.wait

message = read_end.read
read_end.close

puts "Received from child: '#{message}'"

In Ruby we can use IO.pipe, which is a wrapper around the pipe(2) system call, just like fork is a wrapper around fork(2), to create a pipe.

And in this example we create a pipe with IO.pipe and then create the child process with fork. Since just after the call to fork both processes have both pipe file descriptors we need to close the end of the pipe we’re not going to need. In the child process that’s the read-end and in the parent it’s the write-end.

We then write something to the pipe in the child, close the write-end and exit. The parent closes the write-end, waits for the child to exit and then reads the message the child wrote to the pipe. To clean up it closes the read-end. If we run this we get exactly what we expected:

$ ruby pipe.rb
Received from child: 'Hello from your child!'

That’s pretty amazing, isn’t it? Just a few lines of code and we created two processes that talk to each other! By the way, this is the exact same concept a shell uses to make the pipe-character work. It creates a pipe, it forks (once for each process on one side of the pipe) then uses another system call (dup2) to turn the write-end of the pipe into STDOUT and the read-end into STDIN respectively and then executes different programs which are now connected through a pipe.

So how does Unicorn make use of pipes?

Unicorn and pipe(2)

Unicorn uses pipes a lot.

First of all, there is a pipe between each worker process and the master process, with which they communicate. The master process writes command to the pipe (something like QUIT) and the child process then reads the commands and acts upon them. Communication between the master and its worker processes through pipes.

Then there’s another pipe the master process only uses internally and not for IPC, but for signal handling. It’s called the “self-pipe” and we’ll have a closer look at that one later.

And then there’s the ready_pipe Unicorn uses, which is actually quite an amazing trick. See, if you want to daemonize a process under Unix, you need to call fork(2) two times (and do some other things) so the process is completely detached from the controlling terminal and the shell thinks is the process is done and gives you a new prompt.

What Unicorn does when you tell it to run as a daemon is to create a pipe, called the ready_pipe. It then calls fork(2) two times, creating a grand child process. The grand child process inherited the pipe, of course, and as soon as its fully booted up and everything looks good, it writes to this pipe that it’s okay for the grand parent to quit. The grand parent, which waited for a message from the grand-child, reads this and then exits.

This allows Unicorn to wait for the grand child to boot up while still having a controlling terminal to which it can write error messages should something go wrong between the first call to fork(2) and booting up the HTTP server in the grand child. Only if the everything worked the grand child turns into a real daemon process. Process synchronization through pipes.

That does come pretty close to being magic, yep, but this is just a really clever use of fork(2) and pipe(2).

sockets & select(2)

At the heart of everything that has to do with networking under Unix are sockets. You want to read a website? You need to open a socket first. Send something to the logserver? Open a socket. Wait for incoming connections? Open a socket. Sockets are, simply put, endpoints between computers (or processes!) talking to each other.

There are a ton of different sockets: TCP sockets, UDP sockets, SCTP sockets, Unix domain sockets, raw sockets, datagram sockets, and so on. But there is one thing they all have in common: they are files. Yes, “everything is file” and that includes sockets. Just like a pipe, a socket is a file descriptor, from which you can read and write to just like with a file. The sockets API for reading and writing is deep down the same as the file API.

So, let’s say we are writing a server. How do we use sockets for that? The basic lifecycle of a server socket looks like this:

First we ask the kernel for a socket with the socket(2) system call. We specify the family of the socket (IPv4, IPv6, local), the type (stream, datagram) and the protocol (TCP, UDP, …). The kernel then returns a file descriptor, a number, which represents our socket.

Then we need to call bind(2), to bind our socket a network address and a port. After that we need to tell the kernel that our socket is a server socket, that will accept new connections, by calling listen(2). So now the kernel forwards incoming connections to us. (This is the main difference between the lifecycles of a server and a client socket).

Now that our socket is a real server socket and waiting for new incoming connections we can call accept(2), which accepts connections and returns a new socket. This new socket represents the connection. We can read from it and write to it.

But here’s the thing: accept(2) is a blocking call. It only returns if the kernel has a new connection for us. A server that doesn’t have too many incoming connections will be blocking for a long time on accept(2). This makes it really difficult to work with multiple sockets. How are you going to accept a connection on one socket if you’re still blocking on another socket that nobody wants to connect to?

This is where select(2) comes into play.

select(2) is a pretty old and famous (maybe infamous) Unix system call for working with file descriptors. It allows us to do multiplexing: we can monitor several file descriptors with select(2) and let the kernel notify us as soon as one of them has changed its state. And since sockets are file descriptors too, we can use select(2) to work with multiple sockets. Like this:

sock1 = Socket.new(:INET, :STREAM)
addr1 = Socket.pack_sockaddr_in(8888, '0.0.0.0')
sock1.bind(addr1)
sock1.listen(10)

sock2 = Socket.new(:INET, :STREAM)
addr2 = Socket.pack_sockaddr_in(9999, '0.0.0.0')
sock2.bind(addr2)
sock2.listen(10)

5.times do
  fork do
    loop do
      readable, _, _ = IO.select([sock1, sock2])

      connection, _ = readable.first.accept
      puts "[#{Process.pid}] #{connection.read}"
      connection.close
    end
  end
end

Process.wait

That’s a 23-line TCP server, listening on two ports, with 5 worker processes accepting connections. Besides missing some minor things like HTTP request parsing, HTTP response writing and error handling it’s pretty much ready to ship.

No, but seriously, this actually does a lot of stuff in just a few lines with the help of system calls.

We create two sockets with Socket.new, which somewhere deep down in Ruby calls socket(2). Then we bind the sockets to two different ports, 8888 and 9999 respectively, on the local interface. Afterwards we call listen(2) (hidden by the #listen method) and tell the kernel to queue up 10 connections at maximum for us to handle.

With our sockets ready to go we call fork 5 times, which in turn creates 5 child processes that all run the code in the block. So every child calls IO.select (which is the wrapper around select(2)) with the two sockets as argument. IO.select is going to block and only return if one of the two sockets is readable (on a listening socket that means that there are new connections). And this is exactly why we use select(2) here: with accept(2) we would block on one socket and miss out if the other socket had a new connection.

IO.select returns the readable sockets in an array. We take the first one and call accept(2) on it, which is now going to return immediately. Then we just read from the connection, close the connection socket and start our worker loop again.

If we run this and send some messages to our server with netcat like this:

$ echo 'foobar1' | nc localhost 9999
$ echo 'foobar2' | nc localhost 9999
$ echo 'foobar3' | nc localhost 8888
$ echo 'foobar4' | nc localhost 8888
$ echo 'foobar5' | nc localhost 9999

Then we can see our server accepting the connections and reading from them:

$ ruby tcp_sockets_example.rb
[31605] foobar1
[31607] foobar2
[31605] foobar3
[31607] foobar4
[31609] foobar5

Each connection handled by a different child process. Load balancing done by the kernel for us, thanks to select(2).

Unicorn, sockets and select

Before master process calls fork to create the worker processes, it calls socket, bind and listen to create one or more listening sockets (yes, you can configure Unicorn to listen on multiple ports!). It also creates the pipes that will be used to communicate with the worker processes.

After forking, the workers, of course, have inherited both the pipe and the listening sockets. Because, after all, sockets and pipes are file descriptors.

The workers then call select(2) as part of their worker_loop with both the pipe and the sockets as arguments. Now, whenever a connection comes in, one of the workers’ call to select(2) returns and this worker handles the connection by reading the request and passing it to the Rack/Rails application.

And here’s the thing: since the workers call select(2) not only with the sockets, but also with the master-to-worker pipe, they’ll never miss a message from the master while waiting for a new connection. And if there is a new connection, they handle it, close it and then read the message from the master process.

That’s a really neat way to do load balancing through the kernel and to guarantee that messages to workers are not lost or delayed too long while the worker process is doing its work.

Signals

Let’s talk about signals. Signals are another way to do IPC under Unix. We can send signals to processes and we can receive them.

$ kill -9 8433

This sends the signal 9, which is the KILL signal, to process 8433. That’s pretty well-known and a lot of people have used this before (probably with sweat running down their face). But did you know that pressing Ctrl-C and Ctrl-Z in your shell sends signals too?

So what are signals? Most often they are described as software interrupts. If we send a signal to the process, the kernel delivers it for us and makes the process jump to the code that deals with receiving this signal, effectively interrupting the current code flow of the process. Signals are asynchronous — we don’t have to block somewhere to send or receive a signal. And there are a lot of them: the current Linux kernel for example supports around 30 different signals.

Sending signals is pretty good, and I’d bet we’ve all done it a bunch of times, but what’s really cool is this: we can tell the kernel how we want our process to react to certain signals. That’s called “signal handling”.

We have a few options when it comes to signal handling. We can ignore signals: we can tell the kernel we don’t care about a signal and when the kernel delivers an ignored signal to our process it doesn’t jump to any specific code, but instead does nothing. Ignoring signals has one limitation though: we can’t ignore SIGKILL and SIGSTOP, since there has to be a way for an administrator to kill and stop a process, no matter what the developer of that process wants it to do.

The second option is to catch a signal, effectively defining a signal handler. If ignoring a signals means “Nope, kernel, don’t care about QUIT.” then defining a signal action is telling the kernel “Hey, if I receive this signal, please execute this piece of code here”. For example: a lot of Unix programs do some clean-up work (remove temp files, write to a log, kill child processes) when receiving SIGQUIT. That’s done by catching the signal and defining an appropriate signal handler, that does the clean-up work. Catching signals has the limitations that ignoring signals has: we can’t catch SIGKILL and SIGSTOP.

We can also let the defaults apply. Each signal has a default action associated with it. E.g. the default action for SIGQUIT is to terminate the process and make a core dump. We can let that one leave it as it is, or redefine the signal action by catching it. See man 3 signal on OS X or man 7 signal on Linux for a list of the default actions associated with each signal.

So, how do we catch a signal? In Ruby it’s pretty simple:

# signals.rb

trap(:SIGUSR1) do
  puts "SIGUSR1 received"
end

trap(:SIGQUIT) do
  puts "SIGQUIT received"
end

trap(:SIGKILL) do
  puts "You won't see this"
end

puts "My PID is #{Process.pid}. Send me some signals!"

sleep 100

We use trap to catch a signal and pass it a block to define a signal action that will be executed as soon as our process receives the signal. In this example, we try to redefine the signal handler for SIGUSR1, SIGQUIT and SIGKILL. The sleep statement gives us time to send the signals to our process.

If we run this and then send signals to our process with the kill command like this:

$ kill -USR1 31950
$ kill -QUIT 31950
$ kill -KILL 31950

Then our process will output the following:

$ ruby signals.rb
My PID is 31950. Send me some signals!
SIGUSR1 received
SIGQUIT received
zsh: killed     ruby signals.rb

As we can see, the kernel delivered all of the signals to our process. On receiving SIGUSR1 and SIGQUIT it executed the signal handlers, but, as I said before, catching SIGKILL proved useless and the kernel killed the process.

You can probably imagine what we can do with signal handlers. One of the most common things to do with custom signal handlers, for example, is to catch SIGQUIT to do some clean-up work before exiting. But there are a lot more signals and defining appropriate signal handlers can distinguish well-behaving processes from rude ones. Example: if a child process dies the kernel notifies the parent process by sending a SIGCHLD. The default action is to ignore the signal and do nothing, but a well-behaving application would probably wait for the child, clean up after him and write something to a log file.

Unicorn and signals

Unicorn sets up a lot of different signal handlers in the master process, before it calls fork and spawns the worker processes. These signal handlers do a lot of things. Here are a few examples:

QUIT — Graceful shutdown. The master process waits for the workers to finish their work (the current request), cleans up and only then exits.
TERM and INT — Immediate shutdown. Workers don’t finish their work.
USR1 — Reopen the log files. This is mostly used and sent by a logration daemon.
USR2 — Hot-Reload. Start up a new master process with a new version of the application and keep the old master running.
TTIN/TTOU — Increase/decrease the number of worker processes.
HUP — Reload the configuration file while running.
WINCH — Keep the master process running, but gracefully stop the workers.

These signal handlers are like a separate API through which you tell the master and worker processes what to do. And it’s pretty reliable too, considering the fact that signals are essentially asynchronous events and can be sent multiple times. This just screams for race-conditions and locks. So how does Unicorn do it?

Unicorn uses a self-pipe to manage its signal actions. The pipe the master process sets up is this self-pipe, which it will only use internally and not to talk to other processes. It also sets up a queue data structure. After that come the signal handlers. Unicorn catches a lot of signals, as we saw, but each signal handler doesn’t do much. It only pushes the signal’s name into the queue and sends one byte through the self-pipe.

After setting up the signal handlers, spawning worker processes, and so on, the master process goes into its main loop, in which it checks upon the workers regularly and sleeps in between. But it doesn’t just sleep, no, the master process actually goes to sleep by calling select(2) on the self-pipe, with a timeout as argument. This way it can go to sleep but will be woken up as soon as a signal arrived, since the signal handler just send a byte through the pipe, turns it into a readable pipe (from the master’s perspective) and select(2) now returns. After waking up, the master just has to pop off a signal from the queue it set up in the beginning and handle the signals one after another. This is of tremendous value if you consider again that signals are asynchronous and you’ll never know what you’re currently executing when a signal arrives, and that they can be sent multiple times — even if you’re currently executing your signal handler code. Using a queue and a self-pipe in this combination makes handling signals a lot saner and easier.

Worker processes, on the other hand, inherit the master’s signal handlers – again: child processes inherit a lot from their parents. But instead of leaving them as they are, the workers redefine (most of) the signal handlers to be no-ops. They get their signals through the pipe which connects them to the master process. If the master process, for example, receives SIGQUIT it writes the name of the signal to each pipe connected to a worker process to gracefully shut them down. The worker processes call select(2) on this master-worker pipe and the listening sockets, which means that as soon as they finish their work (or don’t have anything to do) they will read the signal name from the pipe and act upon it. This “signal delivery from master to worker via pipe”-mechanism avoids the many problems that can occur if a worker process should receive a signal while currently working of a request.

Magic?

By now we have looked at fork(2) and how easy it is to spawn a new process. We saw that we can use pipes pretty easily outside a shell and without any use of the pipe character by calling pipe(2) and just working with the two file descriptors as if they were files. We also created sockets, worked with select(2), looked at a pre-forking TCP server in 23 lines of Ruby and had the kernel of our operating system do our load balancing for us. Then we saw that Unicorn has its own API composed of signals and that it’s not that hard to work with signals.

These were just some basic Unix concepts. Trivial on their own, powerful when combined.

So, let’s have a closer look at these features of Unicorn that amazed me so much, that I was sure were created by some wizards with long robes and tall hats, in a basement far, far away, on old rusty PDP-11s.

Let’s see how this “magic” is just Unix.

Preloading

If we put preload = true in the configuration file, Unicorn will “preload” our Rack/Rails application in the master process to spare the worker process from doing it themselves. As soon as the application is preloaded, spawning off a new worker process is really, really fast, since the workers don’t have to load it anymore.

The question is: how does this work exactly? Let me explain.

Right after Unicorn has evaluated command line options, it builds a lambda called app. This lambda contains the instructions needed to load our Rack/Rails application into memory. It loads the config.ru file (or uses default settings) and then creates a Rack application with Rack::Builder, on which it calls #to_app.

So what should come out of the lambda is a Rack application in which we just need to call #call to pass it a request and get a response. But since lambdas are evaluated only as soon as they are called, this doesn’t happen when the lambda is defined.

Unicorn passes this app lambda on to the Unicorn::HttpServer, which eventually calls fork(2) to spawn the worker processes. But before it creates a new process, the HttpServer checks if we told Unicorn to use preloading. If we did, only then it calls the lambda. If we didn’t, the workers would each call the lambda after the call to fork(2).

Calling the lambda, which hasn’t been called before, now loads our application into memory. Files are being read, objects are created, connections established – everything is somehow getting stored in memory.

And here comes the real trick: since the master loaded the application into memory, which can take some time if we’re working with a large Rails application, the worker processes inherit it. Yep, the worker processes inherit our application. How neat is that? Since workers are created with fork(2) they already have the whole application in memory as soon as they are created. Preloading is just deciding if the Unicorn calls a lambda before or after the call to fork(2). And if Unicorn called it before, creating new worker processes is really fast, since they are basically ready to go right after creation, except for some callbacks and setup work.

With copy-on-write, which works in the Ruby VM since 2.x, this is even faster. The reason is that “inheriting” involves copying from the parent’s to the child’s memory address space. It’s probably not as slow as you imagine, but with copy-on-write only the memory regions which the child process wants to modify are copied.

And the best part of it is this: the kernel is doing all the work for us. The kernel answers the call to fork(2) and the kernel copies the memory. We just need to decide when to create our objects: before or after the call to fork(2).

This comes in really handy when we now look at another great feature of Unicorn.

Scaling workers with signals

Unicorn allows us to increase and decrease the number of its worker processes by sending two signals to the master process:

$ kill -TTIN 93821
$ kill -TTOU 93821

These two lines add and then remove a new worker process. The signals used, SIGTTIN and SIGTTOU, are normally sent by our terminal driver to notify a process running in the background when it’s trying to read from (SIGTTIN) or write to (SIGTTOU) the controlling terminal. Since Unicorn doesn’t allow not using a logfile when running as a daemon, this shouldn’t be an issue, which means that Unicorn is free to redefine the signal actions (the default for both signals is to stop the process).

It does so by defining signal handlers for SIGTTIN and SIGTTOU that, as we saw, only add the name of the signal to the signal queue and write a byte to the self-pipe to wake up the master process.

The master process, as soon as it wakes up from its main-loop sleep, sees the signals and increases or decreases the internal variable worker_processes, which is just an integer. And right before it goes back to sleep, it calls #maintain_worker_count, which either spawns a new worker or writes SIGQUIT to the pipe connected to the now superfluous worker process to gracefully shut it down.

So let’s say we send SIGTTIN to Unicorn to increase the number of workers. What will happen is that the master wakes up (triggered by the write to the self-pipe), increases worker_processes and calls #maintain_worker_count, which in turn will call another method called #spawn_missing_workers. Yes, that’s right. We looked at this method before, its the same one that’s used to spawn the worker processes when booting up. In its entirety it looks like this:

def spawn_missing_workers
  worker_nr = -1
  until (worker_nr += 1) == @worker_processes
    WORKERS.value?(worker_nr) and next
    worker = Worker.new(worker_nr)
    before_fork.call(self, worker)
    if pid = fork
      WORKERS[pid] = worker
      worker.atfork_parent
    else
      after_fork_internal
      worker_loop(worker)
      exit
    end
  end
  rescue => e
    @logger.error(e) rescue nil
    exit!
end

Again, this is just a loop that calls fork(2) N times. Now that N is increased by one, a new worker process will be created. The other calls to fork are skipped by checking whether WORKERS already contains an instance of Worker with the same worker_nr.

Take note of worker_nr here, it is important. All worker processes have a worker_nr by which they are easily identified in the row of spawned processes.

If we now send SIGTTOU to the master process, the following is going to happen. First of all, the master is woken up by a fresh byte on the self-pipe. Instead of increasing worker_processes now, it decreases it. And again, it calls #maintain_worker_count, which doesn’t jump straight to #spawn_missing_workers. Since no worker process is missing, #maintain_worker_count now takes care of reducing the number of workers:

def maintain_worker_count
  (off = WORKERS.size - worker_processes) == 0 and return
  off < 0 and return spawn_missing_workers
  WORKERS.each_value { |w| w.nr >= worker_processes and w.soft_kill(:QUIT) }
end

It may not be idiomatic Ruby, but these 3 lines are still fairly easy to understand. The first line generates the difference between the number of currently running worker processes and returns if it’s zero. If the difference is negative, a new worker will be spawned (which is where the path of SIGTTIN ends in this method). But since the difference is positive after decreasing worker_processes, the master process now takes the workers with a worker_nr that’s too high and calls soft_kill(:QUIT) on the worker instance.

This in turn sends the signal name through the pipe to the corresponding worker process, which will catch that signal through select(2) and gracefully shut down.

After this, the master process calls Process.waitpid (which in turn calls waitpid(2)), which returns the PID of dead children (and doesn’t leave them hanging as zombies). The worker process with this PID now just needs to be removed from the WORKERS hash and Unicorn is ready to go again.

All of this is pretty simple: fork(2) in a loop, pipes, signal handlers and keeping track of numbers. Again: it’s the combination of that makes these Unix idioms so powerful.

The same can be said for my favorite Unicorn feature.

Hot Reload

This fantastic feature has many names: hot reload, zero downtime deployment, hot swapping and hot deployment. It allows us to deploy a new version of our application, while the old one is still running.

With Unicorn “hot reload” means, that we can spin up a new master process, with new worker processes serving a new version of our application, while the old master process is still running and still handling requests with the old version.

It’s all triggered by sending a simple SIGUSR2 to the master process. But how?

Let’s take a step back and say that our Unicorn master and worker processes are just humming along. The master process is sleeping, waking up, checking up on the workers and going back to sleep. The worker processes are handling requests without a care in the world. Suddenly a SIGUSR2 is sent to the master process.

Again, the signal handler catches the signal, pushes the signal onto the signal queue, writes a byte to the self-pipe and returns. The master wakes up from its main-loop-slumber and sees that it received SIGUSR2. Straight away it calls the #reexec method. It’s a fairly long method and you don’t have to read through it now. But most of “hot reload” is contained in it, so let’s walk through it.

The first thing the method does it to check if the master process is already reexecuting (reexecuting means that a new master process is started by an old one). If it is, it returns and its job is done. But if not, it writes the current PID to /path/to/pidfile.pid.oldbin. .oldbin stands for “old binary”. With the PID saved to a file, the master process now calls fork(2), saves the returned PID of the newly created child process (to later check if it’s already reexecuting…) and returns. The old master process adds “(old)” to its process name (by changing $0 in Ruby) and is now done with #reexec. But since a process created with fork(2) is executing exactly the same code, the new child process goes ahead with #reexec.

Right after the call to fork(2) the child writes the numbers of the sockets it’s listening on (remember: sockets are files, files are represented as file descriptors, which are just numbers) to an environment variable called UNICORN_FD as one string, in which the numbers are separated by commas. (Yes, it keeps track of listening sockets by writing to an environment variable. Take a deep breath. It’ll make sense in a second.)

Afterwards it modifies the listening sockets so they stay open by setting the FD_CLOEXEC flag on them to false.

It then closes all the other file descriptors it doesn’t need (e.g.: sockets and files opened by the Rack/Rails application).

With all preparations and cleaning done, the child process now calls execve(2).

The execve(2) system call turns the calling process into a completely different program. Which program it’s turned into is determined by the arguments passed to execve(2): the path of the program, the arguments and environment variables. This is not a new process we’re talking about: the new program has the same process ID, but its complete heap, stack, text and data segments are replaced by the kernel.

This is how we can spawn new programs on a Unix system and what every Unix shell does when we try to launch Vim: it calls fork(2) to create a child process and then it calls execve(2) with the path to the Vim executable. Without the call to execve(2) we’d end up with a lot of copies of the original shell process when trying to start programs.

That’s also why Unicorn needs to set the FD_CLOEXEC flag to false on the sockets before it calls execve(2). Otherwise the sockets would get closed, when the of the process is being replaced.

Unicorn calls execve(2) with the original command line arguments it was started with (it keeps track of them), in effect spawning a fresh Unicorn master process that’s going to serve a new version of our application. Except that it’s not completely fresh: the environment variables the old master process set (UNICORN_FD) are still accessible by the new master process.

So the new master process boots up and loads the new application code into memory (preloading!). But before it creates worker processes with fork(2), it checks the UNICORN_FD environment variable. And it finds the numbers of our listening sockets! And since file descriptors are just numbers, it can work with them. It turns them into Ruby IO objects by calling IO.new with each number as an argument and has thereby recovered its listening sockets.

And now it calls fork(2) and creates worker processes which inherit these listening sockets again and can start their select(2) and accept(2) dance again, now handling requests with the new version of our application.

There is no “address already in use” error bubbling up. The new master process inherited these sockets, they are already bound to an address and transformed into listening sockets by the old master process. The new master process and its workers can work with them in the same way the worker processes of the old master process do.

Now there are two sets of master and worker processes running. Both are handling incoming connections on the same sockets.

We can now send SIGQUIT to the old master process to shut it down and as soon as it exits the new master process takes over and only our new application version is being served. And all of this happened without the old worker processes stopping their work once.

It’s just Unix

All of this is just Unix. The master-worker architecture, the signal handling, the communication through pipes, the preloading, the scaling of workers with signals and the hot reloading of Unicorn. There is no magic involved.

I think that’s the most amazing part about all of this. The combination of concepts like fork, pipe and signals, that are easy to understand on their own, and leveraging the operating system is where the perceived magic and ultimately the power of great Unix software like Unicorn comes from.

Why?

You might be thinking: “Why? Why should I care about this low-level stuff? I build web applications, why should I care about fork and select?

I think there are some really compelling reasons.

The first one is debugging. Have you ever wondered why you shouldn’t open a database connection (a socket!) before Unicorn calls fork(2)? Or why you get a “too many open files” error when you try to make a HTTP request (sockets!)? Now you know why.

Knowing how your system works on each layer of the stack is immensely helpful when trying to find and eliminate bugs.

The next reason I call the design and architecture reason and boils down to having answers to questions like these: should we use threads or processes? How could these processes talk to each other? What are the limitations? What are the benefits? Will this perform? What’s the alternative?

With some understanding of your operating system and the APIs it offers, it’s far easier to make architectural decisions and design choices when building a system or single components of it.

One more level of abstraction. Someone somewhere at some time said that “it’s always good to know one more level of abstraction beneath the one you’re currently working on” and I totally agree.

I like to think, that learning C made me a better Ruby programmer. I suddenly knew what was happening behind the curtains of the Ruby VM. And if I didn’t know, I could make a good guess.

And I think that knowing deeply about the system to which I deploy my (web) application makes me a better developer, for the same reasons.

But the most important reason for me, which is a personal one, is the realization that everything Unicorn does is not magic! No, it’s just Unix and there is no secret ingredient. Which, in turn, means that I could write software like this. I could write a webserver like this! Realizing this is worth a lot.

Why threads can't fork

2014-10-13T17:40:00+00:00

There is an interesting thread on the Go issue tracker about daemonizing processes. Most of the thread is not about daemonizing processes though, but more about why Go has no Fork() function which you can call directly in your code. The first time I read through it I was wondering and saying to myself: “Yeah, why is there no Fork()? It surely can’t be that hard to implement.” After all you can already call system calls with the syscall package. As I read more and more I realized that the problem is not implementing Fork() per se, but rather implementing Fork() to work safely in a multi-threaded environment, which most Go programs are. So I tried to find out why.

And it turns out that the problem stems from the behaviour of fork(2) itself. Whenever a new child process is created with fork(2) the new process gets a new memory address space but everything in memory is copied from the old process (with copy-on-write that’s not 100% true, but the semantics are the same).

If we call fork(2) in a multi-threaded environment the thread doing the call is now the main-thread in the new process and all the other threads, which ran in the parent process, are dead. And everything they did was left exactly as it was just before the call to fork(2).

Now imagine that these other threads were happily doing their work before the call to fork(2) and a couple of milliseconds later they are dead. What if something these now-dead threads did was not meant to be left exactly as it was?

Let me give you an example. Let’s say our main thread (the one which is going to call fork(2)) was sleeping while we had lots of other threads happily doing some work. Allocating memory, writing to it, copying from it, writing to files, writing to a database and so on. They were probably allocating memory with something like malloc(3). Well, it turns out that malloc(3) uses a mutex internally to guarantee thread-safety. And exactly this is the problem.

What if one of these threads was using malloc(3) and has acquired the lock of the mutex in the exact same moment that the main-thread called fork(2)? In the new child process the lock is still held - by a now-dead thread, who will never return it.

The new child process will have no idea if it’s safe to use malloc(3) or not. In the worst case it will call malloc(3) and block until it acquires the lock, which will never happen, since the thread who’s supposed to return it is dead. And this is just malloc(3). Think about all the other possible mutexes and locks in database drivers, file handling libraries, networking libraries and so on.

In order to call fork(2) in a safe way the calling thread would need to be absolutely sure that all the other threads are to fork too. And this is hard, especially if you’re going to implement a wrapper around fork(2) in a library and have no idea what’s going to be happening all around you.

If the new child process is going to be turned into a different process with execve(2) the problem is not that big, since the heap, stack and data will be replaced. That’s why there is a os.StartProcess() in Go, which uses fork(2) under the hood (see line 65 here). There is still the problem of open file descriptors, which the new child process will inherit but were intended to be used only a now-dead thread. But it’s still possible to close them up, since the new child process would have direct access.

Now you might realize that the title of this post is a lie, since threads can fork. But in practice it’s really hard to pull off, which explains why the Go issue mentioned at the beginning is nearly 5 years old.

There are of course a couple of attempts to provide a solution. [pthread_atfork(3)][http://linux.die.net/man/3/pthread_atfork] allows users to register handlers in threads to be called right before and after fork. But as you can imagine, this can be cumbersome too. Solaris has forkall(2), which does not kill the non-forking-threads but keeps them alive and doing exactly what they did before. This behaviour comes with its own share of problems:

if a thread calls forkall(), the parent thread performing I/O to a file is replicated in the child process. Both copies of the thread will continue performing I/O to the same file, one in the parent and one in the child, leading to malfunctions or file corruption.

To conclude: yes, the title is a lie, and yes, you can fork(2) in a multi-threaded environment, but it is really, really difficult to pull off safely. So let’s just say that threads can’t fork and leave it at that.