Last month, April 2022, marked the 10 year anniversary of my start as a professional programmer.
I started programming earlier than that, but hadn’t been paid a salary. As a teenager I built websites and IRC bots and wrote tiny Python scripts. Then I stopped and played guitar for a few years. In my twenties, I rather coincidentally rediscovered how much I enjoy programming when I was asked to build another website and found out how much had changed about the web while I was away (it’s HTML5 now!).
That made me wonder whether programming wouldn’t be the better career choice than continuing to study philosophy at university. Robin answered that question for me by generously offering me a paid internship.
Now it’s been 10 years, which is, to be honest, neither a significant marker of my growth as a programmer nor my career, but realising that it’s been 10 years made me pause and reflect.
The following is a loose, unordered collection of thoughts that come up when I look back on the past 10 years. Things I’ve learned, things I’ve unlearned, things I’ve changed my opinion on, things I never thought I’d believe in and now do.
They’re very much products of the context in which I helped develop software: as an intern for Robin, then as a junior developer for Robin, as a software developer for a small German startup, as a senior software developer for a German startup inside a huge German corporation, and now as a staff engineer for a fully remote, asynchronous US startup. Take that as a disclaimer. I bet if I’d worked in a game studio, a hardware company, and a big tech corporation instead, this text would be very different.
Most of the programmers I look up to and learned from share one trait that is rarely talked about: fearlessness.
They dive into an unknown codebase without fear. They open up the code of a dependency that they suspect is misbehaving without fear. They start working on something without knowing how they’ll finish.
It’s inspiring seeing someone being fearless, but becoming fearless yourself is one of the best learning accelerators I’ve found.
We all know this. Of course, we can’t predict the future.
But it took me years to truly take it into account when programming.
In the first third of my career I’d think: we will need this, so let’s build it now.
In the second third: we might need this, so let’s prepare for it.
Now: we don’t know whether we’ll need this, it’s a possibility, sure, and it looks like we might need it, yes, but things change all the time, so let’s build what we know we need right now.
I also write code so it’s easy to read and understand, or easy to delete, or easy to modify, or easy to review. I don’t write code only for the computer to execute.
Type safety, 100% test coverage, the ability to fluently express business logic in code, perfect development tooling, an efficient system that wastes no resources, using the best programming language for the job, an elegant API design, a fast feedback loop, writing great code – these are not the goal.
Here’s the goal: providing value to your customers, by shipping software that solves their problem, repeatedly.
The things above help you do that – faster, cheaper, more efficiently, safer, with greater joy – but they’re not the goal. The goal is to provide value to your customers.
The trap: it’s often easier to write software than to deliver it. But delivering is what it’s all about.
I’m not sure I ever thought it is, but now I’m certain it is not. Everything is the result of trade-offs.
You will never reach 100% on every axis that you care about. Something has to give. And when you think you did make it perfect, you’ll soon realise that you forgot something.
My aesthetics have changed too. Instead of looking for the beauty that lies in perfection I now think the program that succeeds despite its flaws is beautiful. Look at that little program go, holding the internet together, despite the 17 TODOs in it.
You can refactor a codebase and clean it up significantly, making it easier to understand for everybody and easier to extend, but all of that won’t matter if that codebase gets deleted four months later because the project didn’t help the business.
You can spend weeks adding tracing and observability to all of the code you write, only to realise that nobody will ever look at it, because that code runs three times a day and never causes any problems.
You can tweak and optimize your code to run so efficiently that the company can halve the number of machines required to run it and then see that the costs you saved are nothing in comparison to the salary you were paid while optimizing.
You can spend your time doing fantastic technical work and still waste it.
If you’d asked me 5 years ago whether TDD, Clean Code, Software Craftsmanship, and other schools of thought are dogmatic, I would’ve said “no! Can’t you see? Clean and good code is important!”
Now I look back at the time when I thought that a rule such as “a method should not be longer than 5 lines” was useful and shake my head.
It’s not about the rules! It’s about the problems these rules are trying to prevent. If you don’t have the problem they’re trying to prevent, or you can prevent it another way, you don’t need the rule.
Don’t worry too much about whether a test is an integration or an end-to-end test, a unit test or a functional test. Don’t fight with others about whether you should test private methods or not. Stop worrying about whether you should hit the database in tests or not.
Instead write tests that tell you the system is working the way it should. Ideally with 3 keystrokes and in less than 1 second.
This one took me a long time, a lot of ultimately useless discussions, and bugs in my code to learn.
If you know exactly what you want to build then best practices and patterns can help you, by giving advice on how to build it.
But if you don’t know yet what the program should do, or what it will look like in four weeks, then some best practices can make things even harder.
Some practices are the best when applied to a rewrite, but the worst when you’re still exploring.
I started my career writing Ruby and JavaScript, with package managers being available and the question “isn’t there a package that does that?” always hanging in the air.
Common sense dictated: if you can, try to use a library instead of writing it yourself. Reuse code as much as you can. Don’t reinvent the wheel. Don’t copy & paste. That was what I believed for years.
But there are downsides to that. Sometimes writing that one function yourself might actually be better than adding a dependency.
Dependencies aren’t free. You have to keep them up to date. They increase your compile or loading times. They add strange things to your stack traces. And very often they do more than what you need them to do, which means you’re paying for more than you’re getting.
When you’re glueing other people’s code together, there’s a very real danger that the glue is where complexity will accumulate. But glue code is the last place where you want your complexity to live. It hides complexity. What you want is to make complexity as visible as you can, shining a light on it with the hope that it turns into dust and disappears.
Sometimes it’s better to write it yourself than to use other people’s code.
There is a big difference between developing software for a software company and developing software for a company that employs software developers because it has to. It’s a joy to work for a company in which leadership gets software and how it’s made.
That being said: I don’t think any company has it all figured out. Everybody’s winging it to some degree.
I’ve never regretted improving a feedback loop. Faster tests, better test output, faster deploys, turning a manual feedback loop into something that gives me a signal with one keybinding.
Watch out, though: once you’ve seen the light of developing software with a really fast and high-signal feedback loop, you’ll long for it forever.
A failing test, a compiler error, a half-finished sentence – end your day with one of these and the next morning you can sit down and continue where you left off, skipping “hmm, what should I do today…” entirely.
There’s nothing that gets me started as fast as a failing test that needs to pass.
Perfectionism is based on a lie. You’ll never get to the point where you’re done and sit and rest and say “ah, now it’s perfect”. There’ll always be something. You know it, I know it. There’s no perfect (see above). Accept it and ship and continue building.
Aim for 80% and consider the other 20% optional. It’s freeing and gives you room to breath. You might end up at 99%, who knows?
I’ve gotten a lot out of investing in my tools: Vim, git, shells, the Unix environment, testing frameworks. I truly enjoy spending a Sunday morning with my Vim configuration.
But it’s possible to overdo it and get stuck in the configuration phase, doing endless tinkering. You have to use your tools to get feedback on how to best configure and use them.
I’ve done hundreds of interviews now and the most important insight I’ve gained is that hiring is really, really hard. The verdict on an interview has so many random inputs that it makes everything between a Strong Yes and Strong No wobbly.
Often I wish there was a way to find out whether people have the get-shit-done gene.
Here’s something that all the people I enjoyed working with have in common: they do the work. They know that some tasks aren’t fun or glamorous or interesting. But someone has to do them, so they do them.
Nothing has helped me get better at software engineering as much as working with a group of other people on the same codebase over multiple years.
You’ll see how decisions play out.
You’ll see what ended up mattering and what didn’t.
You’ll see how extensible your code truly is when your colleague tries to modify it 3 years after you wrote it.
You’ll see whether your prediction of “we have 2 of these now, but I’m sure there’ll be 5 in the future” will come true or not and can take the outcome into account when doing other predictions.
You’ll regret writing some code and you’ll be happy that you wrote some other code. You’ll learn from reflecting on the difference between the two.
You’ll see tooling break down just because something somewhere changed and you had nothing to do with it but you still have to fix it.
You’ll say “I’ve never had to think about this in 3 years” about some pieces of software and cherish them.
You’ll see what parts of the codebase new colleagues struggle to understand and which parts they immediately get productive in.
You’ll see what the code you wrote looks like 4 years later.
There’s few things as motivating to me as hearing “you don’t really need to know how it works…”
Sure, I might not need to, but I wouldn’t do the work I do today if I hadn’t tried to find out how a GC works, or how Unix works, or how multi-threading works, or how a database stores data, or how interpreters and compilers work.
It benefits the work I do, too. I can make better technical decisions by being able to weigh trade-offs more accurately, knowing what goes on under the hood.
I’ve said it before. Don’t let typing be the bottleneck.
For the longest time I assumed it’s my fault when a bug made it through one of my code reviews. I missed that! How could I have missed that? It’s so obvious!
Later I found out that it’s not just me: other people miss bugs in code reviews too. In fact, they accept and freely talk about how code reviews aren’t infallible. I was relieved.
It changed how I see code reviews: as something imperfect, something that needs to be combined with other ways to verify the code.
Not every code needs a really thorough review. Sometimes, if the risk is acceptable, it’s fine to drop a quick “LGTM!”. It unblocks your colleagues, keeps momentum and, somehow, builds trust.
The more you give in to negativity, the more you get. Always much more than you wanted.
It’s viral. It starts with snark, it turns into cynicism, it then morphs into “everything sucks”. Soon after, the question of “why even bother?” starts to attach itself to everything. It ends with people hiding excitement and joy and ideas from you.
Being negative is too easy. At a certain point I realised that pointing at things and saying what’s bad about them and shrugging because, well, didn’t I expect this to be bad (everything’s bad, right?) - that’s easy. Easy to do and easy to mistake for an engineering mindset that can spot deficiencies and worst cases (which it is not).
What’s hard is seeing things for what they could be, what’s beautiful about them. Encouraging ideas even when they’re barely something to talk about. Creating and fostering joy. That’s challenging.
So at some point I decided I had enough and tried to do the challenging thing. So far it’s served me well.
I can’t do everything equally well all the time. I can’t write a book and make progress in my career and be a great father and set PRs in the gym and read two books. It won’t work for more than one or two weeks. It’s not sustainable.
Now I let my interests take turns: when I want to make progress on a specific thing, I focus on that for a while and accept that the other things have to go into maintenance mode.
Code has mass. Every additional line of code you don’t need is ballast. It weighs your codebase down, making it harder to steer and change direction if you need to. The less code you need, the better.
Code has to be read, it has to be tested, it has to be kept compatible, it has to stay secure, it has to keep working. Even if it’s not doing any useful work. It doesn’t hurt having it around, does it? Yes, it does. Delete it and move on. If necessary, restore from version control.
The same is true for tests, which I’ve only learned too late.
Ever since I started as an intern I spent a considerable amount of time outside of work on programming: reading technical books, writing books, working on sideprojects, writing blog posts, giving talks, traveling to conferences, learning new languages and tools.
That some companies don’t care about your college degree if you can demonstrate that you’re really good at programming was fuel for me for years.
I enjoy spending time on programming outside of work, but not all the time. Some of it feels like work. It takes effort to read some technical books. But some things don’t have to feel good while you’re doing them.
My career would be completely different if I had only programmed and learned about programming at my day job.
Building web applications made me think that 100ms is fast and that 50ms is really fast. Writing a compiler has taught me that 1ms is an eternity for a modern computer.
Some of what I wrote can be interpreted as me having grown cynical over the years. I mean: nothing matters and perfection is unachievable? Come on.
But it’s the opposite. I still care. I care very much. But I care about fewer things and I still love programming very much.
]]>They often have their office in their own house, a little sign in front of it says so. Sometimes it’s a separate room, sometimes the downstairs apartment. They help the local Mittelstand companies with their contracts, or they’re specialised in traffic law and can help you when you had a car crash. They know possibly everyone in the notary’s office on a first-name basis, use their telephone as the primary means of communication and their suits aren’t tailor-made. There’s a fax number on the business card, since this is Germany.
If you leave this town, drive for 45 minutes and then take a plane for another 45 minutes and 7 hours you end up in New York City, where they also have lawyers. Let me dip my broad brush into some cheap paint I got from movies and TV shows and paint you one of these New York lawyers from one of New York’s big law firms: big office in a skyscraper, expensive suit, expensive apartment, large corporations as clients, lots of money being paid every hour, hundreds of colleagues that worry about all the things a lawyer shouldn’t need to worry about so they can concentrate on high-level decisions.
I’m fully aware that these paintings are not masterpieces. But try to imagine: my small-town lawyer takes the car and the plane and ended up in New York in an office with his New York equivalent — what would they talk about? Probably how different things are. You have how many cases a month? That’s your client? Okay, wow. No, my wife does the bookkeeping, no, it’s absolutely fine. Repeat that: how long have you been working on this single case?
Two more paintings: a portrait of our small-town doctor and one of the head of neurology in Singapore’s General Hospital. Another pair: small-town architect and one of those architects that designs the new conference center in a large city, or its opera house, or a whole district.
Each pair shows the same profession — lawyer, doctor, architect — but there’s not much in the daily lives of the portrayed they have in common. In fact, a day in the life of a small-town lawyer looks probably nothing like the typical day a big law firm lawyer in New York has.
And that is 100% fine, because their goals and problems are different.
If I were to paint you two pictures of software developers, one of a developer working for a big tech company in Silicon Valley and one working for the IT department of a 200 year old publishing house in Germany with 150 employees, they would also look completely different, because here too, the goals and problems are completely different.
But you know what these two software developers do? They go online and they write about their day-to-day and what they’re doing. And they don’t mention their needs and wants and what their goals and problems are and how they’re highly specific to their environment and how that does and should shape their work. They wouldn’t even mention that they only have three colleagues and that they only work on a single software project that’s critical to the survival of the company. Not a single word would they write about the fact that they work in a research team in a research department that has, effectively, no deadlines. And no paying customers.
Yet their equivalent on the other side of the world would read it and say: “we need to do what they’re doing! It’s working for them! Let’s use the technology they are using. The programming language, the database, the deployment system — if they are using it, why shouldn’t we?”
]]>There are moments in which I ask myself the same thing about programming.
We’re programming computers. We spend large parts of our days writing down instructions for machines. Other parts of the day are spent making sure that we chose the right instructions. Then we talk about those instructions: why and how we picked the ones we picked, which ones we will consider in the future, what those should do and why and how long it will probably take to write those down.
It can sound very serious and dry; a bureaucracy of computer instructions. And yet.
And yet we, the ostensible bureaucrats, talk about magic as something that exists — the good and the bad kind. There are wizards. Instructions are “like a sorcerer’s spells”.
We don’t call them instructions, though, not when talking about what we produce each day anyway. It’s code we write. Emotions are involved. Code, we say, can be: neat, nice, clean, crafted, baroque, minimal, solid, defensive, hacky, a hack, art, a piece of shit, the stupidest thing I’ve ever read, beautiful, like a poem.
Some lines of code are a riddle to anyone but their author and the name code serves as a warning. Other times, strangely, it’s a badge of honor.
Fantastic amounts of code have been written, from beginning to end, by a single person, typing away night after night after night, for years, until one day the code is fed to a machine and, abracadabra, a brightly coloured amusement park appears on screen. Other code has been written, re-written, torn apart and stitched back together across time zones, country borders and decades, not by a single person, but by hundreds or even thousands of different people.
This world of programming is held together by code. Millions and millions of lines of code. Nobody knows how much there is. Some of it is more than 30 years old, some less than a week, and chances are you used parts of both yesterday. There are lines of code floating around on our computers that haven’t been executed by a machine in years and probably won’t be for another lifetime. Others are the golden threads of this world, holding it together at the seams with no more than a dozen people knowing about it. Remove one of these and it all comes crashing down.
If you haven’t been here long enough and try to guess how much there is and how many generations are layered on top of each other — you won’t even come close. But stay around. After a while, more and more, you’ll find yourself in moments of awe, stunned by the size and fragility of it all; the mountains of work and talent and creativity and foresight and intelligence and luck that went into it. And you’ll reach for the word “magic” because you won’t know how else to describe it and then you lean back and smile, wondering how someone could not.
]]>What the second person means is that the first person is wasting their time. They’re optimizing something that’s not slowing them down. Typing is not the bottleneck, because “typing is perhaps 0.5-1% of my programming time”. Programming is thinking, talking to people, planning, researching, is what incarnations of person #2 are saying. And if you think your keyboard is holding you back from thinking, well, you’re wrong.
I get that. I don’t write code the whole day either.
But here’s what I need to write and type besides code: commit messages, pull request descriptions, emails, tickets, comments on tickets, code reviews, documentation, Slack messages, notes, journal entries, RFCs, design documents, requirements.
And being able to type fast and without much effort sure as hell helps. It lets me get back to the thinking.
Effort, that’s the important one, not raw speed. It doesn’t matter if you can type 90 words per minutes or 130, but if it takes you effort to type something and, if given the choice, you’d rather not do it, then we have a problem.
Take me on my phone, for example. It takes me a lot of effort to type on it. It’s not only that I’m slow because I can’t use more than two fingers, but every third word is a typo. Or, even more infuriating, it’s the wrong word and I need to correct autocorrect. Or I switched from English to German while typing and my phone doesn’t know what an umlaut is anymore.
If you see me typing on my phone chances are you can also hear me producing an angry growl-like sound.
Which is exactly why I don’t do it a lot. I barely use the note taking apps I have, because I’d rather type on my computer, with a proper keyboard. I often refrain from replying to messages if I don’t have to, because, yep, I’d have to type those replies.
In other words: typing is a bottleneck for me.
And it’s not just me on my phone. Ever had a chat conversation with somebody who wasn’t comfortable typing? Here’s what you get: short messages, acronyms, typos, missing sentences. You can read how they struggled to type. I once had a colleague who had the habit of walking to my desk saying “Ah, before I type all that up, I thought I’d quickly tell you in person” where “all that” was three to four paragraphs, at the most.
And that’s what this is about. If you’re not comfortable typing a lot and you’d rather not write something down then typing is the bottleneck and you need to fix it. Typing is not something you should need to think about.
Because wouldn’t it be a waste of time if you spend 99% of your time lying in a hammock, thinking, but then choose not to write all of your ideas down, because typing is the bottleneck?
]]>If what I consider working on is not the thing we want to ship itself, but lies in the vast grey area of software projects where I could write code all day long without the user ever noticing, this question helps me decide whether to drop it or invest some time in it.
Let me illustrate.
One imaginary Friday afternoon I notice that we have a few // TODO
comments in our
codebase. Hmm, I could create a bot that looks for those comments whenever a new
commit is pushed. It could use git blame
to see who the author is and create a
ticket assigned to them, saying that they should fix their TODO in line X in
file Y, please. And, cherry on top, when a pull request that touches a TODO is
opened, the bot would mark the corresponding ticket as work-in-progress. And
when the pull request is merged, the bot closes the ticket. And when a pull
request merely changes the TODO:
into a TODO(poorsoul):
then it assigns the
ticket to poorsoul
.
Sounds pretty good, right? Turn those TODOs into tickets and never lose a TODO again.
The problem is: it’s not free. It looks like it is, because the code is quickly written and it runs as a GitHub action we don’t have to pay for, but it’s not.
It’s another process, another tool, another automated piece in our machinery. Another thing that needs to be fixed when it ultimately breaks down, another bit of automation that works 99% of the time, but starts making funny noises when you slip into the 1% and, say, moved a TODO down five lines by accident and don’t want the bot to close and re-open tickets, kicking off another wave of notifications.
That’s the actual cost of adding that bot.
The question is do we want to pay it? Does it help me ship? Does it help me ship more? Or does it help me ship faster, or with less friction, more safely?
If our imaginary codebase has more TODOs than test cases, for example, and these TODOs are holding us back from shipping because we can’t make a change without having to ask colleagues what this TODO we just discovered means, then it might be a good idea to add the bot. Even if we don’t intend to fix all of the TODOs, but only to finally get an overview and a peek at hidden part of the iceberg. It helps us ship.
If the code contains more than one TODO: make sure this works
and we can’t
ship because changing the code is playing a game of Russian roulette, where
every change could kick off an avalanche of bugs, then yes, this bot would
probably help us ship.
But what if we’re not held back by TODOs? What if we have a total of 18 of them, and 12 of those have been in the codebase longer than you and I have been at the company, and, generally speaking, our codebase is in an okay state — is the cost worth it?
If what’s holding you back from shipping is, say, getting more customer input, or a brittle release process, or flaky monitoring, or missing tests, then all the bot does is to add noise. It doesn’t help you ship.
]]>There are a lot of different blog posts I could write about remote working: about its upsides and downsides, what works and doesn’t, when it makes sense and when not, what it requires and why I enjoy it.
But here I want to share my thoughts on a single, specific point that often comes up in discussions about remote work: the obvious downsides of remote work. Put in a sentence: “Of course remote work has advantages, but we all agree that it’s a trade-off; it’s better to have a chance to interact with real people, to have face-to-face time and to be able to quickly talk things through in person. Obviously.”
I’m here to tell you that these obvious downsides of remote work can be (and for me are) upsides.
Let’s start with social interactions. Having other people around, as you would in an office, is good. I agree. I like spending time with other people and some of my best friends started out as colleagues.
But here’s the rub: I’m also a sensitive person. That has its advantages (I’m good at “reading a room” and can empathize with others), but to me it can also mean that I’m easily and involuntarily influenced by other people’s mood.
Working remotely, social interactions lost a lot of their negative influence on me. In other words: the less I see my colleagues face-to-face, the less I worry about their face.
Less “oh, he didn’t seem enthused when I pitched them my idea”.
Less “they rolled their eyes in the company meeting when the CEO announced the new strategy, now I’m not so sure anymore about that strategy myself”.
Less “my manager sighed when I mentioned this problem I have. I’m sure it didn’t mean anything, but… maybe it did?”.
Less “I sent them a message to review my code 5 hours ago, I know they’ve checked their email, I saw it, so why didn’t they review my code?”
Less over-analyzing, less being influenced by things that range from “irrelevant to me” to “so random that it’s ridiculous I even think about it again”.
Here’s another angle: when you meet and discuss things in person — as opposed to in written, virtual form — it’s easy for the loudest person in the room to own the discussion.
I myself can be a pretty loud person (if I spot a chance to crack a joke, you can bet I’ll try to use it) and I have trouble not talking over other people, especially when I get excited about something. I try to work on it, but it’s hard to shut off what often feels like a reflex.
But when the main communication channels are asynchronous text and video calls (as would be the case in a remote work setup) the influence of the loudest person in the room wanes. It’s transferred to the best communicators.
Let me illustrate with an anecdote. When I joined Sourcegraph I spent one week in San Francisco for onboarding. Back then we still had an office and weren’t all-remote. But some of my colleagues were already working remotely and I didn’t meet them in that first week, only afterwards through Slack/GitHub/Zoom.
And you know what? I was highly impressed. Incredible technical knowledge, great writing, fantastic communication skills (proactive, mindful of the recipient, always providing enough context for the message to work asynchronously). All of that was clearly visible when I saw their messages and ideas on Slack, their code, their code reviews and when we jumped on calls to pair.
The twist is that when I finally met some of them in person I realized that they’re really shy and quiet and that if we were put together in a meeting room or an open office I never would’ve gotten the same impression I now had of them. I was actually happy that I met them online first.
We’ve all heard a variation of this: “You just can’t argue that things are much faster when you can have face-to-face meetings and get everyone around a table.”
A friend of mine said this a few years ago: “The Linux kernel is being developed by thousands of people, all over the globe, through email. Email! And you’re telling me we need to meet for two hours to decide when this button shows up or not?”
Two points here.
First, face-to-face meetings are not inherently better. They can be time wasters just like anything else. They can be inefficient without an agenda and clear goals, they can have the wrong people in them, they can end without any results, without notes, without something to show to others.
Second point: I’d argue that if you often need to get everyone in the same room to discuss and decide on something, you probably have too many people discussing and deciding things.
Why do five people have to sit around a table? Are all of them giving their input? If not and just one person is talking, couldn’t that have been an email? Or a video call where the other participants can just listen? Or just a document that took the writer slightly longer to prepare but the other participants less time to read than it would’ve taken them to attend the meeting?
At Sourcegraph I’ve learned what it means to truly work autonomously and the most important ingredients to that are trust and responsibility. What that enables is that you often don’t need five people in a room to make a decision. You need two, maybe three, and even then you often don’t need a meeting, since these two or three people are often on the same page anyway.
Don’t get me wrong, I don’t hate meetings. I actually roll my eyes when I hear things like “ugh, I wish I had no meetings at all and could just code.” Communication and coordination are important. But are face-to-face meetings really the most efficient way to achieve that? No, I don’t think so.
I think if it’s harder to have face-to-face meetings, as it is in a remote company, you start to work around them and can end up with something that has a lot of upsides: less people necessary to make decisions, more decisions being documented, better preparation, clear goals. More trust, more autonomy.
If there’s one overarching point to what I’m writing here, and I’m not sure there is, it could be this: there’s more nuance to all the obvious upsides and downsides of remote and in-office work than tweet-sized insights on the future of work in times of a global pandemic would make you think there are.
]]>All of us agree, of course, that, yes, with a sufficiently generous definition of tool, the tools we use when programming influence the programs. Programming languages, type systems, testing frameworks, linters, etc. – they’re all tools, in one sense or another and they all leave their mark.
But that’s not what kept me staring. This was different, this code wasn’t just shaped by the language it’s written in, posture-corrected by a linter. No, this code was written by another type of tool.
There are tools that help you write better programs and then there are tools that help you better write programs: auto-formatters, auto-complete, jump-to-definition, documentation lookup, search. The latter is what engraved the code I was looking at.
And I freely admit, even though it might be shocking: I’m not a code savant, I can’t close my eyes, put my hand on a screen and whisper when code was written with which editor (I sincerely wish I could, but don’t tell my parents I said that). Yet I think it’s possible to spot an auto-formatter’s imprint.
Because when you look at the code you simply realize: there’s no other way. We programmers are too lazy. Only with these tools would we write a program in such shape and form.
Here’s a snippet that’s similar in its peculiarities to the one that got me here, take a look:
const editableTitle =
inEditMode
?
<form
className='editing-form title-editing-form'
onSubmit={
async evt => {
evt.preventDefault();
try {
const txt = (evt.target as any).text.value;
await setTitle(txt);
setCurrentTitle(txt);
} finally {
setEditMode(false);
}
}
}
>
<textarea name='text' defaultValue={currentTitle}></textarea>
<div className='form-actions'>
<button className='secondary'
onClick={() => setEditMode(false)}>Cancel</button>
<input type='submit' value='Update' />
</div>
</form>
:
<h2>
{currentTitle} (<a href={url}>#{number}</a>)
</h2>;
A ternary operator spanning 26 lines, in JSX, covering multiple inline
functions, one of them using async
/await
and try
/finally
. There is a
lot going on.
Now let me make it clear: this is not about this particular piece of code. And it’s not about JavaScript, TypeScript, React, TSX or JSX either. As far as I know most developers that work with these tools recommend against this style. You could replace the snippet with a lot of other code written in completely different languages. This particular piece is not even that bad.
It’s merely an example to illustrate my point: I bet you wouldn’t write your
code like this if all you had was nano
or Notepad.exe
. Yes, I bet that long
before you would indent a lone ?
for 12, 14, 16 or 40 spaces inside
another ternary operator, wrapping an inline function, you’d restructure your
code.
“Yeah, and if I had to write it with pen and paper, I would’ve quit a long time ago, dude.” Of course. I hear you. And I don’t want to argue that we should go back to punch cards, but this code and all the tools involved in its creation made me wonder: what if the tools we use to write code make us so much better at writing code that we end up unable to work on it without the tools?
If you write text under a microscope, it’s going to end up so tiny that you would only be able to read it while looking through the microscope. What if these tools shape how we write code to such an extent that the code becomes illegible when we approach it without the tools in hand?
They make writing code so much easier by formatting it, moving it around, creating, suggesting and explaining it, but I wonder: do they also help us when we’re not writing new code? Because arguably the majority of our time working on software is not spent writing it: we’re reading code, trying to understand it, slightly tweaking and editing it.
Or did we end up with the programming version of the Omnipotence paradox, writing code that’s so hard to write that we ourselves cannot read it?
Or what if these writing tools only make writing a certain kind of code easier? It’s often said that the actual act of writing the code is the easiest part (“typing is not the bottleneck”) of the whole thing, as if it’s just the manual work, the typing it up, that comes after we made deliberate, concious decisions about a design and its trade-offs. But what if there is a feedback loop between our design choices and what our tools would make easy to type, biasing us against solutions that would require more manual typing?
In concrete terms: would our Java code look different if “Create new class” wasn’t bound to a keyboard shortcut, but instead we’d have a “Show me whether this function is pure or not” key (if such functionality were available)? Can we explain the differences in identifier length preferences between language communities by pointing to the availability of reliable auto-complete in one and lack thereof in another?
Or imagine a far more powerful tool chain than the one we have now, one that would allow us to run multiple analysis passes over our code while we’re still writing it: would we start to write longer functions if we had the ability to hide and show their sub-parts depending on the results of a data-flow analysis, revealing only the parts of the function that relate to the identifier under the cursor in the analysis?
How much of our design and architecture thinking is still bound by what’s easy to type? How much do we bend to the will of our tools? And, maybe most importantly, are we even aware of it?
]]>Imagine we’ve been handed a task and we’re free to choose the programming language. The assignment involves all sorts of string manipulation: reading strings, splitting strings, trimming, joining and running regular expressions over strings, everything in UTF-8 and, of course, emojis need to work. Which language do we choose? C? Oh, please no.
Another job, this time at a financial institution. We need to do tens of thousands of concurrent calculations. High performance is a hard requirement. Should we use… Ruby? Come on. Next up: a one-off script that renames a bunch of files… written in Java? A web browser… in Python? Programming a controller for a medical device with… C#? Swift? Lua? You get the point.
Different programming languages are good at different things and bad at others. Each one makes certain things easier and in turn others harder. Depending on what we want to do we can save ourselves a lot of work by choosing the language that makes solving the type of problem we’re facing the easiest.
That’s one of the tangible, no-nonsense benefits of learning more languages. You put another tool in your toolbox and when the time comes you’re able to choose the best one. But I would go one step further.
I think it’s valuable to learn new programming languages even if — here it comes — you never take them out of the box.
Languages shape the way we think*, each in their own peculiar way. That’s true for programming languages as well. Each language contains a different mental model, a different perspective for thinking about computation and how to write programs.
Take SQL, for example, and how it shapes your thoughts about the flow and the form of data in your program. Now consider what that would look like in an imperative, object-oriented language like Java, or a functional language like Haskell. Or in C. Imagine what a multi-player game server looks like in Python, in Haskell, in Erlang; streaming and processing terabytes of data in C, in Go, in Clojure; a user interface in Tcl, in Lua, in JavaScript.
Every programming language is a lens through which we can look at the problem we’re trying to solve. Through some of them the problem appears convoluted, exhausting. Through others it doesn’t even look like a problem at all, it looks barely different from any other mundane thing one does in this language.
By learning a new language, even if it stays in your toolbox for all eternity, you gain a new perspective and a different way of thinking about problems. Once you’ve implemented a game server in Erlang, you’re going to see game servers in a different light. After you’ve processed data in a Lisp by thinking of the data as a series of lists that you can mold by sending it through a series of tiny functions that can be composed to form pipelines of functions, you’ll see shadows of this pattern appear everywhere. As soon as you’ve had your first real taste of memory management in C, you’ll start to appreciate what Python, Ruby and Go are doing for you — while seeing the cost of their labour. And if you ever built a UI in JavaScript with React.js, you know that you’re thinking about UI components shifted, in a fundamental way.
These new perspectives, these ideas and patterns — they linger, they stay with you, even if you end up in another language. And that is powerful enough to keep on learning new languages, because one of the best things that can happen to you when you’re trying to solve a problem is a change of perspective.
* This is known as linguistic relativity or the Sapir–Whorf hypothesis. In the context of this article I support the thesis in certain ways, but you should know that in the scientific community it’s validity is still very much open for debate. See this article for an introduction to the problems with the thesis.
]]>In the beginning, there is always a single text file, nothing more. It’s called
ideas.md
or book.md
. It contains a list of thoughts and ideas, an outline.
Everything else grows from there. It only makes sense that we start by talking
about files.
Both of my books, Writing An Interpreter In Go and Writing A Compiler In Go, are written in GitHub Flavored Markdown (GFM). One file per chapter and all files under version control using git.
I only use a basic set of Markdown features in my texts: headings, emphasis, lists, links, images, quotes. And fenced code blocks. This last one is the most important one to mention here, because every piece of code presented in the books is contained in the Markdown files in the form of fenced code blocks.
Yes, that has all the drawbacks you imagine it to have. While I have syntax highlighting for fenced code blocks, editing them is not as comfortable as if they were their own files. But, most importantly, the code is also duplicated: one version lives in a Markdown file and one (or more) lives in the code folder that comes with the book. If I want to update a snippet of code presented in the book, I have to manually update every copy of it. Yes, cumbersome.
But there is one undeniable advantage to this approach: it works and it works exactly like I want it to. There are quite a few tools out there to embed code in Markdown files but none of them allow me to present a change to a piece of code.
Since we – you, the reader, and me, the writer – work on a single codebase in both books, we often have to extend or modify existing code. To show these changes I comment out the already existing parts of a method and just show what’s been added or changed. Like this:
// compiler/compiler.go
func (c *Compiler) emit(op code.Opcode, operands ...int) int {
// [...]
pos := c.addInstruction(ins)
return pos
}
I don’t know of an existing tool that can do that. They either embed portions of
or a complete file. And, yes, that file could be a *.diff
, but even that would
have to be generated separately and beforehand. So I went with fenced code
blocks.
And believe me, I was this close to writing my own tool. A preprocessor that would not only allow me to embed auto-generated diffs into Markdown but also to run commands on a set of changes and embed the generated output, too.
What kept me from doing that was a calm voice in my head telling me that I’m here to write a book, not a preprocessor. And since copying code into Markdown files is only cumbersome once you have to go back and edit the code, but actually quite comfortable while writing, I just kept on doing that, ignoring the other voices.
Now I have written two books and zero tools. I consider that a success.
Of course I do not send plain text files out to readers. Instead, they receive nicely formatted PDF, ePub, Mobi and HTML files, which I create with only a tiny number of tools: pp, pandoc and KindleGen. Together they form a pipeline:
First, the Markdown files are piped through pp, a generic preprocessor for text files that can do a lot of things, but which I only use to replace two variables in the text: the URL of the zipped code folder readers can download and the current version of the book.
After that, the resulting Markdown is handed over to pandoc, the most important part of this pipeline.
Here’s the shortest possible description of what Pandoc does: it takes text in one format and outputs it in another format. Markdown goes in, HTML comes out. Or turn it around and put in HTML and get Markdown back. Or feed it Markdown and get DOCX, or ODT, or PDF, or AsciiDoc, or any other of the myriad of supported formats.
In my pipeline, Pandoc takes the Markdown files of the book and, with a little bit of YAML containing meta data, turns them into PDF, HTML and ePub files. The default output is already nice to look at, but I have a custom template for each of these three formats, all of which are based on Pandoc’s default templates.
Since the HTML output is a single file with CSS in the <head>
it’s easy to
style. The same goes for ePub, which is really just a ZIP archive containing
HTML files and is probably the one I styled the less, because I think it looks
pretty good by default.
PDF generation, though, is done using LaTeX and requires a template written in LaTeX. I’ve stitched mine together from Pandoc’s default template and what Stack Overflow, hours of trial and error and the enlightenment and horror that was “holy shit, did you know LaTeX has its own package manager?” have given me. I like to touch it only when absolutely necessary. In the end, though, that doesn’t matter much.
What comes out looks beautiful to me and Pandoc is, without any doubt and exaggeration, one of the best tools I’ve ever used. It does exactly what it promises to, its documentation is stellar, it’s actively and carefully maintained and never once let me down. If I would have to shorten this post to one word, it would be “Pandoc”.
The only thing Pandoc can’t do is produce Mobi files, which is what Amazon uses for their Kindle eBook readers and store. For that, I use Amazon’s own command line tool KindleGen, which turns the ePub produced by Pandoc into a Mobi file. No styling or templates required.
Once the final files fall out of the pipeline I bundle them in a ZIP file, together with a folder that contains all of the code presented in the books. Ready to be published.
I self-publish both books in both editions, eBook and paperback. Self-publishing means that instead of a publisher I have to take care of selling, printing and distributing the books to readers.
While I could theoretically run my own shop on which I sell the books, I don’t want to. I want to write books, not a web application for selling books, especially not one that involves the handling of taxes for an international audience. So instead I use two services to take care of that for me.
The first one is Gumroad, which I use to sell and distribute the eBook editions. I upload my ZIP, Gumroad accepts payment via PayPal or credit card and then sends the file to the reader – in exchange for a rather small fee. It also takes care of collecting taxes for me and I can set the price without any limitations, refund customers, send out free updates and create promo codes. After nearly two years, I’m still a happy customer and the only two features I’d love to have are more payment methods and pricing per country, so I can set a lower price for readers in India, for example.
The paperback editions are sold, printed on demand and shipped by Amazon Kindle Direct Publishing (KDP). I upload a print-ready cover and PDF version of a book and Amazon turns it into a paperback that you can purchase in seven different Amazon stores. Createspace is what I previously used for that, but after Amazon bought Createspace, they started to move the Createspace functionality over to KDP. By now, I’ve completely switched over and only use KDP. One less tool to worry about, since I was using KDP anyway to publish the Kindle version of the books on the Kindle stores.
For someone like me, a person who starts to sweat when we he hears “CMYK”, “RGB” and “you need to change your file” in one sentence, creating print-ready artifacts can be a bit of a hassle, but using LaTeX for the PDF generation comes in quite handy here. In a separate LaTeX template I use with Pandoc I can set the dimensions and margins of the document to exactly what I need and LaTeX takes care of the rest.
Readers can then purchase my books just like any other product on Amazon, including Prime shipping, refunds and all the payment methods accepted by Amazon. The downside of all this is a loss of control for me. I can’t, for example, offer personalized coupon codes nor can I bundle the paperback with the eBook edition.
I still think it’s worth it. When you upload a PDF file on Friday and then hold the paperback version of that file in your hands on Wednesday, you quickly forget about wrestling with color models of PDFs and start to grow convinced that we’re living in the future.
That’s it. That’s the complete journey, from bytes in a text file to ink on paper or a ZIP in your inbox.
But here’s the most important bit, saved for last: none of this matters if you want to write a book. Quite a few people have told me that they want to write a book, but they’re not sure about which tools to use. My advice: all you need to write a book is a program that allows you to write text into a file.
Tools are only important to the process of writing a book in that they should get out of your way. You shouldn’t have to worry about how to put text in a file, only what text. Once you can do that comfortably – you know, with autosaving and the ability to edit effortlessly – keep on doing it. And then, keep doing it. Once you have something you’d be happy to publish, you can start to worry about tools.
]]>I knew from releasing the paperback edition of Writing An Interpreter In Go that a lot of people still prefer paper over eBooks. So it didn’t come as a big surprise when, right after the release of Writing A Compiler In Go, people started asking me about a paperback edition.
But I replied that before I start working on a paperback edition, I first need to take a break. I’ve worked on this book for close to a year and I wanted to sit back and take a big breath. I knew that I’ll eventually release a paperback, but that could wait a few weeks, or months even.
As it turns out, I’m pretty bad a taking big breaths when there’s something I can and want to do. So, here we are. Exactly two weeks after the release of the eBook, Writing A Compiler In Go is now available as a paperback:
It’s 18cm wide and 26cm long, exactly like its predecessor. They look good when put next to each other on a shelf. This one is thicker, though, with 338 pages — roughly 60 more than the first one.
The other notable change is that this book doesn’t have full-color but monochrome syntax highlighting. When I released the first paperback edition of Writing An Interpreter In Go I did not yet realize how expensive full-color printing really is. Now I do and know why nobody else does it, which is also why the current paperback edition of Writing An Interpreter In Go is black & white, too.
I’m pretty happy with how it turned out:
]]>
The pages you are about to read were found amidst the rubble of a collapsed ruin. Wedged between the scratched and battered cases of old machines once called “computers”. Bearing, in faint white and barely readable, the title “Writing An Interpreter In Go. Chapter 5: A Macro System For Monkey.” …
Alright, I’ll admit it: that was a lie. What I want to show you is not really a lost chapter, preserved through the eons, found in the ruins of a long-gone civilization. I just needed a good intro.
You see, I couldn’t sit still. In the first couple of months after publishing Writing An Interpreter In Go I took some time off from Monkey, the programming language we built in the book. “The book is done. Take a breath and play around with something else. After working on it for a year you deserve it”, I told myself, only to grow more anxious by the week about all the features, optimizations and tweaks I could try and add to Monkey. In the end, the temptation of everything Monkey could still be won. I gave in and restarted work on Monkey again.
This resulted in two things: a project I’m not ready to talk about yet and a new, additional chapter for Writing An Interpreter In Go called The Lost Chapter: A Macro System For Monkey, which I want to tell you all about.
It started with me getting sidetracked while working on said secret project by discovering how elegant and beautiful macros in Racket are. I guess, I just can’t stop myself from ushering an impressed “nice” when hearing about “code that writes code”. Next thing I know I was digging through various implementations of macros in different languages and getting more and more fascinated. It’s code that writes code! It’s a hand that draws itself! How could I not be fascinated by that?
A few “Huh, interesting…” followed by more “Well, I guess, it wouldn’t be too hard to just…” later I successfully added macros to Monkey. Macros that are able to modify and generate Monkey source code and are evaluated in their own macro expansion phase. A real, Lisp-style macro system. I was elated.
In fact, the whole journey from learning about how macros are implemented and why they’re so powerful to implementing them myself was so mind-blowing and fun that I had to write about it.
At first I thought I was writing a blog post or a tiny addition to the book and gave it the working title “The Lost Appendix”, thinking of a few pages hidden at the end of a book.
It ended up with the title The Lost Chapter: A Macro System For Monkey, because what we have here is not a small addition. It’s a complete chapter, close to 50 pages in PDF format, that shows you how to implement a fully-working macro system for Monkey - step by step, all code shown, fully tested, just like the book. You can think of it as the fifth chapter of Writing An Interpreter In Go, since it seamlessly continues the previous four. It’s just being delivered a few months later than the rest of the book.
But why “The Lost Chapter”? Because a text about macros deserves a touch of mystery, don’t you think? It’s code that writes code, come on! It’s snakes eating their own tail and surgeons operating on themselves! If that isn’t worthy of title that’s a little bit out there, I don’t know what is.
I also didn’t want to make it an addition to the book itself. On the practical side there’s the hurdle of extending a paperback edition by around 50 pages and not being able to send the update to readers who already bought the paperback. But then there were also, let’s say, “conceptual” considerations.
While I consider learning to build your own programming language a worthwhile endeavor that can teach you a lot of valuable things about programming, I’ll concede that it looks pretty disconnected from the realities of one’s day job. But adding a macro system? Writing code that lets you write code that writes code? That doesn’t just look unrealistic, but rather … Let me put it this way: totally and completely nuts and, oh, incredible fun!
I wanted this chapter to be exactly that: a fun addition to Writing An Interpreter In Go, not quite Monkey canon, but a bizarro expansion pack; a curious and accidental supernova in the same universe.
Oh, and did I mention it’s available for free? Well, it’s available for free. Read it online or download it as PDF/HTML/Mobi/ePub here:
The downloadable version also includes all the runnable, tested code shown in the chapter and the complete Monkey interpreter from Writing An Interpreter In Go.
I hope it’ll get you to usher a “nice”, too.
]]>As it turned out, to my surprise, quite a few people told me that they’d love hold a copy of the book in their hands. And I also had some free time on my hands. Alright, let’s do it then, I thought.
But even though I said that time and interest were the only limiting factors, I knew that there couldn’t be a printed version without Monkey - the programming language that we build in the book - having a logo. Yes, I know, I know, that’s not a real requirement, but a little indulgement I wouldn’t deny myself. So I created a 99designs contest and Hazel Anne submitted the winning entry. I love the logo Monkey has now.
A paperback version of a book also needs a full cover, front and back, and so I wrestled with vector images and PDFs and print dimensions and page bleed and spine widths for quite a while. But, in the end, using createspace to print and distribute my book turned out to be much easier than one might think. I was lucky enough to already have had a working Pandoc setup in place and only needed to add one more LaTeX template, the one for the print version.
The result, I think, was worth it:
That’s 260 pages, 18cm wide and 26cm long, with full-color syntax highlighting.
Since the book is printed on-demand by createspace, which is an Amazon company, it’s available for purchase in these Amazon stores:
Or you can just go to interpreterbook.com and click on one of the big, red buttons.
If you appreciate holding a physical copy of a book in your hands more than having a PDF on your hard drive, I hope you enjoy this paperback edition.
]]>I’m talking about interpreters, compilers and transpilers. Programming languages are the ultimate, universal tools and sit at the bottom of stack on which a bazillion other tools are built. Some programming languages offer so much power that their creation was the big bang for whole categories of other tools.
But I’m also talking about DSLs, code generators and templating engines. And
databases with query languages. And database drivers that make these databases
available to programming languages. jQuery and its $('exactly what I want')
interface. jq and its query language. Webservers. Editors, IDEs, code
analysers and generators.
It seems to me what they all have in common, what is close to their center of power, is parsing. Parsing user input, parsing source code, parsing query expressions, parsing configuration files, parsing network responses. Maybe it’s parsing itself what makes these tools so powerful. I’m not sure.
What I know and what I’m sure about is that without knowledge of parsing you won’t be able to build tools like these. Knowing how to write a parser is like a secret power and once you have it, you realize that you’re now able to solve a whole range of problems you haven’t even considered before. Now you can create higher value tools.
]]>What follows is much more of a confession than a precise description of a refined workflow or a secret productivity technique.
I didn’t have a TODO list I didn’t abandon after three weeks. Did I get things done? I did, but I never read Allen’s book. I also didn’t organize my time according to the four quadrants. I didn’t use a bullet journal to keep on top of ideas and tasks, didn’t use a pomodoro timer and didn’t keep a work journal. org-mode? I wish. Unplug, turn off notifications and just use pen and paper? That’s ridiculous, I have a keyboard.
Some tasks and ideas I put in Wunderlist, some in a Trello board and others in a file called “TODO.md”. Occasionally I even came back to each one and moved some things around.
Taking notes wasn’t much more organized. There’s a shell script I built. It’s based on the sound principles of popsicle-sticks-and-duct-tape-engineering and helped me to quickly create text files in a “notes” folder. Other times I used Notes.app. I also had iA Writer on my phone to access my Dropbox folder and directly write random ideas into the book. When I felt like it, I also did this on my computer: write ideas and outlines directly into the files that make up the book.
All of this changed from week to week and month to month. Sometimes from one day to the other.
The only constant in these 11 months was this: I was determined to finish the book, to keep chipping away at it until it’s done. I got up every day at 5:45am and tried to take another step forward, using whatever it takes.
But don’t take this for something it isn’t. It would simply be a lie to say that every morning I sat in front of my computer and got a solid hour of writing done before heading to work.
Sometimes I got up, drank two, three cups of coffee and just browsed the internet for an hour, breaking the chain. Other times I wrote for ten minutes at home and for 30 more on the train. On my best days, I wrote for an hour at home and for the whole train ride. On some days I only wrote down one sentence, more often than not starting with “FIXME:”.
Is there a moral to the story? I’m not sure, maybe it’s this one: productivity tools and techniques can only help, they won’t ever do the work for you.
It’s easy to fall into this trap and think that once the TODO lists are tidy and organized and the best notebook money can buy is sitting on the table, half of the work is already done. Of course, that’s not the case. Just like an expensive guitar won’t make you a great guitar player and the best running shoes won’t get you out of the door every day, productivity techniques won’t finish your project. They might help, but you have to put the work in. You have to keep showing up and keep chipping away at it. No tool will ever do that for you.
]]>Some people might say that this conversation will never, ever happen. Well, “better be prepared” is what I say.
Brainfuck is a weird looking programming language and keeps every promise its name makes. Here is “Hello, World!” in Brainfuck:
++++++++[>++++[>++>+++>+++>+<<
<<-]>+>+>->>+[<]<-]>>.>---.+++
++++..+++.>>.<-.<.+++.------.-
-------.>>+.>++.
If you’re now thinking “Heck, I’d use that in production”, let me tell you that Brainfuck was conceived as a fun, teaching language. Its inventor Urban Müller wanted Brainfuck to be a language that’s easily implementable and thus make it the perfect choice for someone who wants to learn more about interpreters or compilers.
I think, he reached that goal. Implementing Brainfuck is an eye-opening experience. Even though it’s a tiny language, it’s perfectly well-equipped to illustrate a number of concepts behind programming language implementations.
But before we can build Brainfuck, we need to understand how Brainfuck thinks.
One thing in which programming languages differ is their model of the world and how they make it accessible to their users.
Take C, for example. Leaving aside the multitude of abstractions that hide in the depth of the kernel and the hardware, when working with C you can peek behind the curtain and see the inner workings of your computer. You are pretty close to the hardware-supported stack and you can allocate and free memory on the heap. If you’re experienced and stare intently enough, you can see the actual machine code when looking at your C code. The same goes for C++.
In Forth you mainly work with a stack. You push, you pop, you swap and drop. Nearly everything you do happens on a stack. In Forth, the stack is the world.
In other languages, these underlying assumptions about the mechanics of the world are abstracted away. Even though the current version of the Ruby Virtual Machine has a stack, you won’t notice. You don’t push and pop, but send messages to objects. The same goes for Java. You have classes that inherit from each other and memory allocation only concerns you in so far as the garbage collector shows up on time.
Then there are some languages that explicitly tell you what their world looks like. Especially intermediate languages, which are not meant to be written by hand, but are representation of end-user languages and easier for computers to understand and optimize. WebAssembly, for example, represents the commands of a stack-based machine, that gets then emulated by a runtime (which will be a browser, most of the time). Java bytecode is a representation of Java code in the world of a stack machine.
And then there’s Brainfuck. Brainfuck doesn’t just tell you what its view of the world is, no, it smacks you over the head with it.
Brainfuck is based on the assumption that Brainfuck code will be executed by a
Brainfuck machine. Just like the PUSH
and POP
operations in Java bytecode
assume that the JVM manages a stack, the +
and -
in Brainfuck assume that
there’s a Brainfuck machine which supports these two instructions.
So what does this Brainfuck machine look like? Not too complicated! It only has a few parts:
Memory: The machine has 30000 memory cells, that can each hold an integer value from 0 to 255 and are initialized to 0 by default. Each cell is addressable by a zero based index, giving us a range of 0 to 29999 as possible indexes.
Data pointer: It “points” to a memory cell, by holding the value of
the cell’s index. E.g.: if the value of the data pointer is 3
, it points to
the fourth memory cell.
Code: The program that’s executed by the machine. It’s made up of single instructions, which we’ll get to in a short while.
Instruction pointer: It points to the instruction in the code that’s to
be executed next. E.g.: if the code is ++-++
and the instruction
pointer has the value 2
then the next instruction to be executed is -
.
Input and output streams: Just like STDIN and STDOUT in Unix systems, these are normally connected to the keyboard and the screen and are used for printing and reading characters.
CPU: It fetches the next instruction from the code and executes it, manipulating the data pointer, instruction pointer, a memory cell or the input/output streams accordingly.
That’s it. Those are all the parts of a complete, working Brainfuck machine that can execute Brainfuck code. So let’s take a closer look at Brainfuck code.
Brainfuck is tiny. It consists of eight different instructions. These instructions can be used to manipulate the state of the Brainfuck machine:
>
- Increment the data pointer by 1.<
- Decrement the data pointer by 1.+
- Increment the value in the current cell (the cell the data pointer is pointing to).-
- Decrement the value in the current cell..
- Take the integer in the current cell, treat it as an ASCII char and
print it on the output stream.,
- Read a character from the input stream, convert it to an integer and
save it to the current cell.[
- This always needs to come with a matching ]
. If the current cell
contains a zero, set the instruction pointer to the index of the instruction
after the matching ]
.]
- If the current cell does not contain a zero, set the instruction
pointer to the index of the instruction after the matching [
.That’s all of it, the complete Brainfuck language.
Even though these instructions look archaic, they’re just identifiers. Replace
+
with PLUS
, -
with SUB
, .
with PRINT
and [
with LOOP
and
suddenly Brainfuck starts to look more like
Brain-oh-wow-wait-a-second-I-can-actually-read-that.
Now that we know what the machine should look like and what it has to do, let’s get started with building it.
The basic structure will be called - you guessed it - Machine
and looks like
this:
// machine.go
type Machine struct {
code string
ip int
memory [30000]int
dp int
input io.Reader
output io.Writer
}
func NewMachine(code string, in io.Reader, out io.Writer) *Machine {
return &Machine{
code: code,
input: in,
output: out,
}
}
As you can see, everything we’ve talked about is here: the code, the instruction
pointer (ip
), the memory, the data pointer (dp
) and both the input
and
output
streams.
Now we just need a method that can start this Machine
and get it to execute
code:
// machine.go
func (m *Machine) Execute() {
for m.ip < len(m.code) {
ins := m.code[m.ip]
switch ins {
case '+':
m.memory[m.dp]++
case '-':
m.memory[m.dp]--
case '>':
m.dp++
case '<':
m.dp--
}
m.ip++
}
}
Here we step through every instruction in m.code
until we reach its end. In
order to execute each instruction individually, we have a switch
statement,
that “decodes” the current instruction and manipulates the machine according to
which instruction it is.
In the case of +
and -
we manipulate the current memory cell, incrementing
and decrementing its value respectively. The current memory cell is pointed to
by the data pointer, m.dp
, and we can get to it with m.memory[m.dp]
. And in
order to change the data pointer itself, we have two case
branches for >
and
<
.
So far, so good. But we’re missing printing and reading, the .
and ,
instructions. In order to implement support for those, we need to make a slight
modification: we need to give our Machine
a one-byte buffer slice.
// machine.go
type Machine struct {
// [...]
buf []byte
}
func NewMachine(code string, in io.Reader, out io.Writer) *Machine {
return &Machine{
// [...]
buf: make([]byte, 1),
}
}
With that in place, we can add two new methods called readChar
and putChar
:
// machine.go
func (m *Machine) readChar() {
n, err := m.input.Read(m.buf)
if err != nil {
panic(err)
}
if n != 1 {
panic("wrong num bytes read")
}
m.memory[m.dp] = int(m.buf[0])
}
func (m *Machine) putChar() {
m.buf[0] = byte(m.memory[m.dp])
n, err := m.output.Write(m.buf)
if err != nil {
panic(err)
}
if n != 1 {
panic("wrong num bytes written")
}
}
readChar
reads one byte from the input, which will be os.Stdin
, and then
transfers this byte to the current memory cell, m.memory[m.dp]
. putChar
does
the opposite and writes the content of the current memory cell to the output
stream, which will be os.Stdout
.
It has to be said, that instead of doing proper error handling here, we just let
the machine blow up by calling panic
. That shouldn’t happen, of course, when we
plan to use it in production (I dare you), so keep that in mind.
Using these two methods means adding new case
branches to the switch statement
in Execute
:
// machine.go
func (m *Machine) Execute() {
for m.ip < len(m.code) {
// [...]
case ',':
m.readChar()
case '.':
m.putChar()
// [...]
}
}
And with that, our Brainfuck machine can read and print characters! It’s time to move on to the hairiest part of the implementation.
Brainfuck’s two control flow instructions are [
and ]
. And they’re not
quite like loops or other control flow mechanisms in “normal” languages.
Expressed in some Go-like dialect of pseudo-code, what they do is this:
switch currentInstruction {
case '[':
if currentMemoryCellValue() == 0 {
positionOfMatchingBracket = findMatching("]")
instructionPointer = positionOfMatchingBracket + 1
}
case ']':
if currentMemoryCellValue() != 0 {
positionOfMatchingBracket = findMatching("[")
instructionPointer = positionOfMatchingBracket + 1
}
}
Note the two different conditions of the if-statements. They are the most
important bits here, because they give both instructions separate meaning.
Here’s an example to see how [
and ]
can be used:
+++++ -- Increment current cell to 5
[ -- Execute the following code, if the current cell is not zero
-> -- Decrement current cell, move data pointer to next cell
+< -- Increment current cell, move data pointer to previous cell
] -- Repeat loop if current cell is non-zero
This snippet increments the current cell to 5 and then uses [
and ]
to add
the cell’s value to the next cell, by decrementing and incrementing both cells in a
loop. The body of the loop will be executed 5 times until the first cell
contains zero.
Of course, implementing the “does the current memory cell hold zero or not?”
check is not the problem. Finding the matching brackets is what’s hairy about
this, because brackets can be nested. It’s not enough to find the next ]
when
we encounter a [
, no, we need to keep track of every pair of brackets we find.
How are we going to do that? With a simple counter! Here is the pseudo-code from
above turned into real Go code:
// machine.go
func (m *Machine) Execute() {
for m.ip < len(m.code) {
ins := m.code[m.ip]
switch ins {
// [...]
case '[':
if m.memory[m.dp] == 0 {
depth := 1
for depth != 0 {
m.ip++
switch m.code[m.ip] {
case '[':
depth++
case ']':
depth--
}
}
}
case ']':
if m.memory[m.dp] != 0 {
depth := 1
for depth != 0 {
m.ip--
switch m.code[m.ip] {
case ']':
depth++
case '[':
depth--
}
}
}
}
m.ip++
}
}
Let’s take a closer look at the case '['
branch.
Here we check whether the current memory cell’s value is zero and if it is, we
try to set the instruction pointer, ip
, to the position of the matching ]
.
In order to do that correctly in the face of nested bracket pairs, we use
depth
as a counter. With each [
we pass, we increment the counter, and with
each ]
we decrement it. Since it’s set to 1
initially, we know that we are
sitting on our matching ]
when depth
is 0
. And that means that m.ip
is
set to the correct position. The m.ip++
at the end of the for-loop does the
rest and sets the instruction pointer to the instruction right after the
matching bracket.
The case ']'
branch is the mirrored version, where we walk backwards in the
instructions, trying to find the matching [
.
It’s time to flip the power switch on this machine.
Here is a small driver, that reads in a file and passes it to our Brainfuck machine:
// main.go
package main
import (
"fmt"
"io/ioutil"
"os"
)
func main() {
fileName := os.Args[1]
code, err := ioutil.ReadFile(fileName)
if err != nil {
fmt.Fprintf(os.Stderr, "error: %s\n", err)
os.Exit(-1)
}
m := NewMachine(string(code), os.Stdin, os.Stdout)
m.Execute()
}
That gives us the possibility to run Brainfuck programs on the command line:
$ cat ./hello_world.b
++++++++[>++++[>++>+++>+++>+<<
<<-]>+>+>->>+[<]<-]>>.>---.+++
++++..+++.>>.<-.<.+++.------.-
-------.>>+.>++.
$ go build -o machine && ./machine ./hello_world.b
Hello World!
It talks! Sweet! Our Brainfuck machine works!
I have some good and some bad news. Our product manager said that the Brainfuck interpreter needs to be fast and, sadly, ours isn’t. That’s the bad news.
On my computer, our machine currently takes around 70 seconds to execute mandelbrot.b, a mandelbrot set fractal viewer written in Brainfuck by Erik Bosman, that’s often used as a benchmark for Brainfuck interpreters. That’s slow.
$ go build -o machine && time ./machine ./mandelbrot.b >/dev/null
./machine ./mandelbrot.b > /dev/null 68.24s user 0.18s system 99% cpu 1:08.60 total
The good news is, that there are a few things we can do to make it faster.
Take a look at the hello_world.b
example from above or the mandelbrot.b
program. See all those runs of +
and -
? There are a lot of instructions of
the same type right behind each other in Brainfuck programs. And we have to read
each one, check which one it is and then execute it.
The overhead of doing this is high. Consider this Brainfuck snippet: +++++
. In
order to execute it, we need five cycles of “fetch the next instruction”, “what
instruction do we have here?” and “execute this!”. That turns into us
incrementing the value of the current memory cell by one five times. It would
give us a huge performance boost if we could just increase the current cell’s
value by five directly.
The other thing that’s slowing us down is the way we handle [
and ]
. Every
time we stumble upon such a bracket, we go looking for its matching counterpart
again. Scan the program, keep track of all the other brackets we pass and then
modify the instruction pointer. The longer the program, the longer this will
take. If we could do that just once for each bracket and remember the position
of its matching counterpart, we wouldn’t need to rescan the program again and
again.
And here’s the best of news: we can! We can do all of this before we even start
up our Brainfuck machine. We can turn +++++
into something that says “increase
by 5”. We can also do the same for -
, >
, <
, .
, and ,
. And we can find
and remember the positions of matching bracket pairs beforehand. All we need to
do is create another representation of the original Brainfuck code that can
include these optimizations and have our machine execute this instead.
Up until now we’ve used a string
to represent the code, that’s to be executed by
the Machine
. But in order to make optimizations, we need a new instruction
set. Here is the Instruction
type, that makes up the new set:
// instruction.go
type InsType byte
const (
Plus InsType = '+'
Minus InsType = '-'
Right InsType = '>'
Left InsType = '<'
PutChar InsType = '.'
ReadChar InsType = ','
JumpIfZero InsType = '['
JumpIfNotZero InsType = ']'
)
type Instruction struct {
Type InsType
Argument int
}
Each Instruction
has a Type
and an Argument
. The Type
can be one of the
predefined constants defined at the top, where each constant has a corresponding
Brainfuck instruction. The interesting part here is the Argument
field. This
field allows us to make our instruction set much more dense than the original
Brainfuck code. We can put more information in less instructions. We’ll use
Argument
in two ways:
In the case of +
, -
, .
, ,
, >
, and <
the Argument
field will
contain the number of original Brainfuck instructions this Instruction
represents. E.g.: +++++
will be turned into Instruction{Type: Plus,
Argument: 5}
In the case of [
and ]
the Argument
field will contain the position of
the instruction of the matching bracket. E.g.: the Brainfuck snippet []
will
be turned into two Instruction
s: Instruction{Type: JumpIfZero, Argument:
1}
and Instruction{Type: JumpIfNotZero, Argument: 0}
.
Now that we have our new Instruction
type and know how this new instruction
set is to be interpreted, we can modify our Machine
to do exactly that. The
first thing we need to do is to change its definition, so it doesn’t work with a
string
anymore, but with a slice of *Instruction
:
// machine.go
type Machine struct {
code []*Instruction
ip int
memory [30000]int
dp int
input io.Reader
output io.Writer
readBuf []byte
}
func NewMachine(instructions []*Instruction, in io.Reader, out io.Writer) *Machine {
return &Machine{
code: instructions,
input: in,
output: out,
readBuf: make([]byte, 1),
}
}
With that change made, the Execute
method of the Machine
now also needs to
work with this new type of instruction set:
// machine.go
func (m *Machine) Execute() {
for m.ip < len(m.code) {
ins := m.code[m.ip]
switch ins.Type {
case Plus:
m.memory[m.dp] += ins.Argument
case Minus:
m.memory[m.dp] -= ins.Argument
case Right:
m.dp += ins.Argument
case Left:
m.dp -= ins.Argument
case PutChar:
for i := 0; i < ins.Argument; i++ {
m.putChar()
}
case ReadChar:
for i := 0; i < ins.Argument; i++ {
m.readChar()
}
case JumpIfZero:
if m.memory[m.dp] == 0 {
m.ip = ins.Argument
continue
}
case JumpIfNotZero:
if m.memory[m.dp] != 0 {
m.ip = ins.Argument
continue
}
}
m.ip++
}
}
That’s a lot cleaner than what we had before, right? And it’s faster, too! Well,
I can’t prove it yet, because there’s still a piece missing: something that
turns Brainfuck code into a slice of *Instruction
s.
Wikipedia defines a compiler as:
a computer program (or a set of programs) that transforms source code written in a programming language (the source language) into another computer language (the target language)
That’s exactly what we need! A program that takes Brainfuck code and turns it
into our new “language”, which is made up of our Instruction
s.
And that’s also a pretty clear definition of requirements, which allows us to
define our Compiler
:
// compiler.go
type Compiler struct {
code string
codeLength int
position int
instructions []*Instruction
}
func NewCompiler(code string) *Compiler {
return &Compiler{
code: code,
codeLength: len(code),
instructions: []*Instruction{},
}
}
The Compiler
is constructed with the original Brainfuck code as a string
and
has an empty instructions
slice that will be filled. That’s the job of
the Compile
method:
// compiler.go
func (c *Compiler) Compile() []*Instruction {
for c.position < c.codeLength {
current := c.code[c.position]
switch current {
case '+':
c.CompileFoldableInstruction('+', Plus)
case '-':
c.CompileFoldableInstruction('-', Minus)
case '<':
c.CompileFoldableInstruction('<', Left)
case '>':
c.CompileFoldableInstruction('>', Right)
case '.':
c.CompileFoldableInstruction('.', PutChar)
case ',':
c.CompileFoldableInstruction(',', ReadChar)
}
c.position++
}
return c.instructions
}
That looks remarkably close to the Execute
method of the current and previous
versions of our Machine
. But there’s a huge difference: whereas the Machine
executed the Brainfuck instructions directly, our Compiler
now turns them into
*Instruction
s, so they can be executed later. Here is what the
CompileFoldableInstruction
method does:
// compiler.go
func (c *Compiler) CompileFoldableInstruction(char byte, insType InsType) {
count := 1
for c.position < c.codeLength-1 && c.code[c.position+1] == char {
count++
c.position++
}
c.EmitWithArg(insType, count)
}
func (c *Compiler) EmitWithArg(insType InsType, arg int) int {
ins := &Instruction{Type: insType, Argument: arg}
c.instructions = append(c.instructions, ins)
return len(c.instructions) - 1
}
Together with EmitWithArg
the CompileFoldableInstruction
method scans through the
input code (the Brainfuck string
code) to see if the current instruction is
followed by other instructions of the same type. If that’s the case, it folds
those Brainfuck instructions into one Instruction
.
EmitWithArg
is a helper method that creates a new *Instruction
, adds it to
the c.instructions
slice of the Compiler
and returns the position of this
newly created instruction in c.instructions
.
Returning the position of the newest instruction is an important detail, because
we’re going to need it now. As you may have noticed, we didn’t add support for
[
and ]
to our Compiler
yet. That’s because these are not foldable
instructions (e.g.: we cannot turn [[[
into a single instruction), but need to
do something more elaborate.
We have two loop instructions: [
and ]
. And we want to turn them into
JumpIfZero
and JumpIfNotZero
instructions, where the Argument
field
contains the position of the matching bracket. That is: the position of the
matching counterpart Instruction
in the final instructions
slice.
That’s easier said than done, though. The problem is that when we encounter a
[
we don’t know where in the final instructions slice the matching ]
instruction will end up. Counting the instructions in between doesn’t work,
because it’s possible that those will be folded together in the next compilation
step and thus invalidate the position we got through counting.
Then there’s also the problem of remembering the position of the last
JumpIfZero
instruction, so it can be used as Argument
when constructing the
matching JumpIfNotZero
instruction.
But here’s what we’re going to do, here’s how we’re going to solve these
problems. First, we will emit a JumpIfZero
instruction for each [
we
encounter, with the placeholder value 0
in the Argument
field. Later, when
we have constructed the matching JumpIfNotZero
instruction, we’re going to
come back to this instruction and change its Argument
to the real value.
In order to later be able to change them, we need to keep track of JumpIfZero
instructions. And we’re going to use a stack to do that, implemented with a
simple Go slice:
// compiler.go
func (c *Compiler) Compile() []*Instruction {
loopStack := []int{}
for c.position < c.codeLength {
current := c.code[c.position]
switch current {
case '[':
insPos := c.EmitWithArg(JumpIfZero, 0)
loopStack = append(loopStack, insPos)
// [...]
}
c.position++
}
return c.instructions
}
loopStack
, which acts as a stack onto which we can push elements and later pop
them off, is just an empty slice. There’s not much to it. Interesting here is
the case
branch for the [
instructions. Just like we discussed, we emit a
new JumpIfZero
instruction with a placeholder Argument
. Then comes the
important part: we push the position of the new JumpIfZero
position onto our
loopStack
.
That, in turn, allows us to correctly handle ]
instructions:
// compiler.go
func (c *Compiler) Compile() []*Instruction {
// [...]
case ']':
// Pop position of last JumpIfZero ("[") instruction off stack
openInstruction := loopStack[len(loopStack)-1]
loopStack = loopStack[:len(loopStack)-1]
// Emit the new JumpIfNotZero ("]") instruction,
// with correct position as argument
closeInstructionPos := c.EmitWithArg(JumpIfNotZero, openInstruction)
// Patch the old JumpIfZero ("[") instruction with new position
c.instructions[openInstruction].Argument = closeInstructionPos
// [...]
}
We pop the position of the last JumpIfZero
instruction, the opening [
, which
still holds a placeholder 0
as Argument
, off the stack, and use it as the
correct Argument
for a new JumpIfNotZero
instruction.
And since we now have the position of the JumpIfZero
instruction, we can
access it in c.instructions
and change its Argument
from 0
to the
correct position of the new JumpIfNotZero
instruction!
Isn’t that neat? Now our Compiler
takes this piece of Brainfuck code
+++[---[+]>>>]<<<
And turns it into these Instruction
s:
[]*Instruction{
&Instruction{Type: Plus, Argument: 3},
&Instruction{Type: JumpIfZero, Argument: 7},
&Instruction{Type: Minus, Argument: 3},
&Instruction{Type: JumpIfZero, Argument: 5},
&Instruction{Type: Plus, Argument: 1},
&Instruction{Type: JumpIfNotZero, Argument: 3},
&Instruction{Type: Right, Argument: 3},
&Instruction{Type: JumpIfNotZero, Argument: 1},
&Instruction{Type: Left, Argument: 3},
}
All that’s left to do now is making use of it.
In order to make use of our optimized Machine
and our shiny new instruction
set, we have to use our Compiler
when we read in a file of Brainfuck code:
// main.go
package main
import (
"fmt"
"io/ioutil"
"os"
)
func main() {
fileName := os.Args[1]
code, err := ioutil.ReadFile(fileName)
if err != nil {
fmt.Fprintf(os.Stderr, "error: %s\n", err)
os.Exit(-1)
}
compiler := NewCompiler(string(code))
instructions := compiler.Compile()
m := NewMachine(instructions, os.Stdin, os.Stdout)
m.Execute()
}
That’s looks a lot like our old driver. But instead of reading in a file and
passing its content to our Brainfuck machine, we first compile the original
Brainfuck code in the file to our new Instruction
set. And these
Instruction
s will then be executed by our Machine
.
If we now run this with the mandelbrot.b benchmark we can see that our work paid off: what took 70s before now only takes 13s!
$ go build -o machine && time ./machine ./mandelbrot.b >/dev/null
./machine ./mandelbrot.b > /dev/null 13.43s user 0.04s system 99% cpu 13.496 total
Isn’t that something?
Yes, we’ve only implemented Brainfuck, a language with no syntax to speak of and only eight different instructions. You might be tempted to call our two Brainfuck machines toys. But let’s take a look at what we actually did.
The first thing we built is an interpreter that acts as a Brainfuck machine. It has all the necessary parts: memory cells, data and instruction pointers, input and output streams. The interpreter effectively tokenizes its input by processing it byte by byte. It then evaluates each token on the fly. It’s not much longer than 100 lines, but has all the essential parts of a fully-grown interpreter.
And then we’ve built a compiler! Sure, it doesn’t output native machine code and
it’s really simple, but it’s a compiler nonetheless! It takes Brainfuck code as
input and outputs instructions for a machine - our Brainfuck machine. That’s the
basic idea behind compilers. We could also change the way our Instruction
s are
stored and passed around, and then we’d realize that our Machine
is now a virtual
machine and is executing bytecode.
Now, that doesn’t sound like toys, does it? What we built is using the same blueprints a lot of other, mature and production-ready programming languages use. Once you’ve understood how and why they work, you start to recognize them in other languages, too, and in turn understand these languages better.
And that’s why I think implementing Brainfuck can be a rewarding and eye-opening experience.
You can find the complete code, including tests, for both versions of the Brainfuck machine here on GitHub.
]]>Sometimes I jokingly call the summer of 2015 my “Summer Of Lisp”. But, honestly, I’m only half joking when I say this. It really was a great and Lispy summer programming-wise: I was working through the final chapters of Structure And Interpretation Of Computer Programs (SICP), which I began studying at the beginning of that year, was totally fascinated by Lisp, enamored by Scheme and also starting to learn Clojure by working through the fantastic The Joy Of Clojure.
SICP had an immense impact on me. It’s a wonderful book, full of elegant code and ideas; it hearkens “to a programming life that if true, would be an absolute blast to live in”. Especially the fourth chapter made a lasting impression. In this chapter, Abelson and Sussmann show the reader how to implement the so called “meta-circular evaluator” - a Lisp interpreter in Lisp. “Mesmerized” is probably the word I’d use to describe myself while reading this chapter.
The code for the meta-circular evaluator is elegant and simple. Around 400 lines of Scheme, stripped down to the essentials and doing exactly what they are supposed to. It’s a beautiful piece of software. I asked a friend to design a poster for me, containing only the source code for the meta-circular interpreter, beautifully formatted. That poster hung next to my office desk for over a year.
But soon I discovered why it’s only 400 lines. The code presented in the book skips the implementation of an entire component - the parser. Huh. But how does a parser work then? I was stumped. I really wanted to know how that parser works. And I almost never want to skip anything I don’t know yet. I really want to know how things work, at least in a rough sense. Black boxes and skipping things always leave me wanting to dig deeper.
In that same summer I also read Steve Yegge’s “Rich Programmer Food”, in which he argues what a worthwhile goal it is to learn about and to understand compilers. Let me quote my favorite passage:
That’s why you need to learn how [compilers] work. That’s why you, yes you personally, need to write one.
[…]
You’ll be able to fix that dang syntax highlighting.
You’ll be able to write that doc extractor.
You’ll be able to fix the broken indentation in Eclipse.
You won’t have to wait for your tools to catch up.
You might even stop bragging about how smart your tools are, how amazing it is that they can understand your code […]
You’ll be able to jump in and help fix all those problems with your favorite language.
That blog post flipped a switch. Determined as if there was some kind of weird challenge I said to a friend of mine: “I’m going to write a compiler”. I believe, I was gazing into the distance while saying this. “Alright”, he said rather unimpressed, “do it.”
Without having taken a compiler course in college or even having a computer science degree I set out to write a compiler. The first goal, I determined, is to get a foot in the door and write an interpreter. Interpreters are closely related to compilers, but easier to understand and to build for beginners. But most importantly, this time there would be no skipping of anything. This interpreter will be built from scratch!
What I found was that a lot of resources for interpreters or compilers are either incredibly heavy on theory or barely scratching the surface. It’s either the dragon book or a blog post about a 50 line Lisp interpreter. The complete theory with code in the appendix or an introduction and overview with black boxes.
Every piece of writing helped though. Slowly but surely I was completing work on my interpreter. The tiny tutorials, the slightly longer blog posts and the heavy compiler books - I could find something useful in all of them.
Nevertheless I was getting frustrated. There needs to be a book, that … One day, I said to the same friend, who earlier so enthusiastically encouraged me to write a compiler:
“You know what… I’d love to write a book about interpreters. A book that shows you everything you need to know to build an interpreter from scratch, including your own lexer, your own parser and your own evaluation step. No skipping of anything!”
Somehow this turned into me giving myself a motivational speech.
“And with tests too!”, I continued, “Yeah! Code and tests front and center! Not like in these other books, where the code is an unreadable mess that you can’t get to compile or run on your system. And you don’t need to be well versed in mathematical notation either! It should be a book any programmer can read and understand.”
It’s entirely possible that I was banging my fist on the table at this point. Calmly, my friend said: “Sounds like a good idea. Do it.”
And here we are, 11 month later, and “Writing An Interpreter In Go” is available to the public. It has around 200 pages and presents the complete and working interpreter for the Monkey programming language, including the lexer, the parser, the evaluator and also including tests. No black boxes, no 3rd party tools and no skipping of anything. Nearly every page contains a piece of code. I’m really proud of this book.
]]>Eval
in Go! Using Go!”
Let me explain. There’s the scanner package, which contains the lexer (or scanner, or tokenizer, …) that turns Go source code into tokens. These tokens are defined in their own package, token. And then there’s the parser, which takes the tokens and builds an AST. The definitions of the AST nodes can be found in the perfectly named AST package. And then there’s also a printer package to print these AST nodes.
In other words: we have all the necessary pieces here to build an Eval
function that evaluates Go code. In fact, with these packages we could build a
complete Go interpreter in Go. If you’re really interested in doing that,
check out the go-interpreter project, which aims to do just
that. Instead, let’s start small and write an Eval
function that evaluates
mathematical Go expressions.
The first thing we need is a driver, a REPL:
package main
import (
"bufio"
"fmt"
"os"
)
const PROMPT = "go>> "
func main() {
scanner := bufio.NewScanner(os.Stdin)
for {
fmt.Printf(PROMPT)
scanned := scanner.Scan()
if !scanned {
return
}
line := scanner.Text()
fmt.Println(line)
}
}
This allows us to input Go expressions and have them printed back to us:
% go run eval.go
go>> 1 * 2 * 3 * 4
1 * 2 * 3 * 4
go>> 8 / 2 + 3 - 1
8 / 2 + 3 - 1
go>>
So far, so dull.
The next step would be to initialize Go’s scanner with these input lines and turn them into tokens. Luckily, the parser package has a ParseExpr function that does exactly that. It initializes the scanner and reads in the tokens for us. It then parses the tokens and builds an AST. We can use it to parse the input in our REPL:
package main
import (
"bufio"
"fmt"
"go/parser"
"os"
)
const PROMPT = "go>> "
func main() {
scanner := bufio.NewScanner(os.Stdin)
for {
fmt.Printf(PROMPT)
scanned := scanner.Scan()
if !scanned {
return
}
line := scanner.Text()
exp, err := parser.ParseExpr(line)
if err != nil {
fmt.Printf("parsing failed: %s\n", err)
return
}
}
}
The result of our call to ParseExpr
, exp
, is an AST that represents the
entered Go expression, without such details as comments, whitespace or
semicolons. We can use the printer package to print it. We just have
to use token.NewFileSet()
to make the printer believe that we got our Go
source code from a file:
import (
"bufio"
"fmt"
"go/parser"
"go/printer"
"go/token"
"os"
)
func main() {
// [...]
for {
// [...]
exp, err := parser.ParseExpr(line)
if err != nil {
fmt.Printf("parsing failed: %s\n", err)
return
}
printer.Fprint(os.Stdout, token.NewFileSet(), exp)
fmt.Printf("\n")
}
}
Now would you look at that:
% go run eval.go
go>> 1 * 2 * 3 * 4
1 * 2 * 3 * 4
go>> 5 * 6 * 7 * 8
5 * 6 * 7 * 8
Okay, yes, you’re right. That looks exactly like our “printing back the input” mechanism we had before. But there’s more to it. What we’re actually doing here is parsing the input and pretty-printing the AST produced by the parser. See for yourself:
% go run eval.go
go>> 1 * 2 * 3 * (((5 + 6)))
1 * 2 * 3 * (5 + 6)
go>>
The whitespace has been removed, just like the superfluous parentheses around
the last sub-expression. We’ve built our own crude version of gofmt
in around
35 lines of Go code:
% go run eval.go
go>> func (name string) { return name }
func(name string) {
return name
}
go>>
But we want more than just pretty-printing the AST. We want an Eval
function
that evaluates mathematical Go expressions. What Eval
has to do is to
traverse each node in the AST and evaluate it. Granted, this definition is
kinda recursive, but that’s perfect, because Eval
itself is a recursive
function:
import (
"bufio"
"fmt"
"go/ast"
"go/parser"
"go/token"
"os"
"strconv"
)
func Eval(exp ast.Expr) int {
switch exp := exp.(type) {
case *ast.BinaryExpr:
return EvalBinaryExpr(exp)
case *ast.BasicLit:
switch exp.Kind {
case token.INT:
i, _ := strconv.Atoi(exp.Value)
return i
}
}
return 0
}
func EvalBinaryExpr(exp *ast.BinaryExpr) int {
left := Eval(exp.X)
right := Eval(exp.Y)
switch exp.Op {
case token.ADD:
return left + right
case token.SUB:
return left - right
case token.MUL:
return left * right
case token.QUO:
return left / right
}
return 0
}
As you can see, Eval
takes an ast.Expr
as argument, which is what we get
back from parser.ParseExpr
. It then traverses this part of the AST but only
stops at *ast.BinaryExpr
and *ast.BasicLit
nodes. The former is an AST node
that represents binary expressions (expressions with one operator and two
operands) and the latter represents literals, like the integer literals we used
in our REPL.
What Eval
has to do in the case of an integer literal is easy. Integer
literals evaluate to themselves. If I type 5
into the REPL then 5
is what
should come out. Eval
only needs to convert the parsed integer literal to a
Go int
and return it.
The case of *ast.BinaryExpr
is more complex. Here Eval
has to call itself
two times to evaluate the operands of the binary expression. Each operand can
be another binary expression or an integer literal. And in order to evaluate
the current expression, both operands need to be fully evaluated. Only then,
depending on the operator of the expression, is the correct evaluating result
returned.
All that’s left for us now is to use Eval
in our REPL:
func main() {
// [...]
for {
// [...]
exp, err := parser.ParseExpr(line)
if err != nil {
fmt.Printf("parsing failed: %s\n", err)
return
}
fmt.Printf("%d\n", Eval(exp))
}
}
Now our REPL can do this:
% go run eval.go
go>> 1 + 2 * 3 + 4 * 5
27
go>> 1000 - 500 - 250 - 125 - 75 - 25
25
We’ve successfully put a working Eval
function in Go! And it only took us
around 70 lines of code, because we used Go’s internal compiler tools.
In the last couple of months I developed a certain approach to writing code. Whenever I write a new function, class or method I ask myself: “Is this code stupid enough?” If it’s not, it’s not done and I try to make it stupid.
Now, stupid code does not mean “code that doesn’t work”. Stupid code should work exactly like it’s supposed to, but in the most simple, straightforward, “stupid” way possible.
Anyone could write it and anyone reading it should be able to understand it. It shouldn’t make the reader think about the code itself, but about the problem at hand. It shouldn’t be long, it shouldn’t be complex and, most importantly, it shouldn’t try to be clever. It should get the job done and nothing more.
What does stupid code look like? It depends on the problem it’s trying to solve. Take meta-programming, for example, which is often considered complex and “black magic”. Does asking myself “is this code stupid enough?” mean “no meta-programming allowed”? Not necessarily, no. There are certain cases, in which the problem can be solved in the simplest way through meta-programming. But there are a lot more cases in which meta-programming is unnecessary and additional baggage on top of the solution, which gets in the way of understanding what the code is supposed to do.
The goal is to get rid of the baggage, to chip away at it until the most stupid, still working, tests-passing code emerges.
Keep in mind the “stupid” here: “it works” is not good enough. A lot of complex, “look at this clever trick”, overly-abstracted, unreadable code works and makes the tests pass. That’s not what I’m after. It has to be stupid: not clever, not complex, not hard to understand.
Besides “stupid” the resulting code might also be described as “elegant”, “clean” and “simple”. But the “write stupid code” mantra is not as elusive as “write elegant code”, for example, and seems far more achievable, which makes the approach much more valuable to me. And besides that: I find it much more likely to start out with “write stupid code” and end up with an elegant solution than the other way around.
Not every elegant solution is straightforward, but “stupid” ones are, per definition, and can also be elegant.
]]>Unicorn is a webserver written in Ruby for Rails and Rack applications. When I first used it I was amazed. This is magic, I thought. It had to be. Why?
Well, first of all: the master-worker architecture. Unicorn uses one master
process to manage a lot of worker processes. When you tell Unicorn to use 16
worker processes it does so, just like that. And now you’re looking at 17
processes when you run ps aux | grep unicorn
— each with a different name,
showing whether its the master process or one of the worker processes, which
even have their own number in their process names.
$ pstree | grep unicorn
\-+= 27185 mrnugget unicorn master -c simple_unicorn_config.rb -l0.0.0.0:8080
|--- 27210 mrnugget unicorn worker[0] -c simple_unicorn_config.rb -l0.0.0.0:8080
|--- 27211 mrnugget unicorn worker[1] -c simple_unicorn_config.rb -l0.0.0.0:8080
|--- 27212 mrnugget unicorn worker[2] -c simple_unicorn_config.rb -l0.0.0.0:8080
|--- 27213 mrnugget unicorn worker[3] -c simple_unicorn_config.rb -l0.0.0.0:8080
|--- 27214 mrnugget unicorn worker[4] -c simple_unicorn_config.rb -l0.0.0.0:8080
|--- 27215 mrnugget unicorn worker[5] -c simple_unicorn_config.rb -l0.0.0.0:8080
|--- 27216 mrnugget unicorn worker[6] -c simple_unicorn_config.rb -l0.0.0.0:8080
|--- 27217 mrnugget unicorn worker[7] -c simple_unicorn_config.rb -l0.0.0.0:8080
|--- 27218 mrnugget unicorn worker[8] -c simple_unicorn_config.rb -l0.0.0.0:8080
|--- 27219 mrnugget unicorn worker[9] -c simple_unicorn_config.rb -l0.0.0.0:8080
|--- 27220 mrnugget unicorn worker[10] -c simple_unicorn_config.rb -l0.0.0.0:8080
|--- 27221 mrnugget unicorn worker[11] -c simple_unicorn_config.rb -l0.0.0.0:8080
|--- 27222 mrnugget unicorn worker[12] -c simple_unicorn_config.rb -l0.0.0.0:8080
|--- 27223 mrnugget unicorn worker[13] -c simple_unicorn_config.rb -l0.0.0.0:8080
|--- 27224 mrnugget unicorn worker[14] -c simple_unicorn_config.rb -l0.0.0.0:8080
\--- 27225 mrnugget unicorn worker[15] -c simple_unicorn_config.rb -l0.0.0.0:8080
How would one build something like this? I had no idea.
And then there’s a feature called “hot reload”, which means that you can tell
Unicorn, while it’s running, to spin up a new version of your application. As
soon as you do, Unicorn starts a new master process, which is going to serve
the new version of your application. All the while the old master process is
still running, responding to requests with your old application. Of course, the
old master now has “old” in its name. Now, as soon as the new master process is
fully booted up, you can send a QUIT
signal to the old master process, which
will in turn shut down and let the new one take over. And just like that you’ve
switched to a new version of your application — without any downtime at all.
Oh, and Unicorn uses a lot more than the QUIT
signal! There are tons of
signals you can send to it: TTIN
to increase the number of workers, TTOU
to
decrease it, USR1
to rotate the log files, USR2
to perform hot reloading,
HUP
to re-evaluate the configuration file. I didn’t know half of these signal
names and there were even more in Unicorn’s own SIGNALS
file.
And then there’s “preloading”: a feature of Unicorn that allows you to spin up
new worker processes in less than a second, a fraction of the time it takes to
boot up my Rails application. Somehow Unicorn is able to preload my
application in memory and make use of that when creating new worker processes.
And I had no idea how that works! Not a clue! And as if that wasn’t enough I
discovered that Unicorn even has a file called PHILOSOPHY
in its repository.
Who else has that?! I was sure that there was some black magic going on.
Because: how could Unicorn work like it does without magic?
After my first encounter with Unicorn I learned quite a bit about Unix systems and after a while I came back to Unicorn — still in amazement. But this time I read through the source code and it turns out, that, well, the secret ingredient to Unicorn is not magic but plain, old Unix.
Now, most people know Unix from a “user’s perspective”: the command line,
shells, pipes, redirection, the kill
command, scripting, text files and so
on. But there’s this whole other side of Unix, too, which we could call the
“developer’s perspective” now. From this side of Unix you can see signal
handling, inter-process communication, usage of pipes without the
|
-character, system calls and whole lot more.
In what follows we’re going to have a look at Unicorn. We’ll take it apart and see that it’s just using some basic Unix tricks, the ones you can use as a developer, to do its work. The way we’re going to do that is by going through some of these Unix tricks, basic building blocks of every Unix system, and see how they work and how Unicorn uses them.
At the end we’ll go back to the “magic” of the beginning: hot reload, preloading, master-worker architecture. And we will see how these features work and how they are just Unix and not magic.
So let’s get started.
fork is how processes are created. Every process after the first one (with PID 1) was created with fork. So what is it, what is fork?
fork is a system call. Most of the time we can recognize system calls by the
2 behind their name (e.g. fork(2)
) which means that we can find
documentation about them in section 2 of the Unix manual, nowadays known as
“man pages”. So in order to see the documentation for fork(2)
you can run
man 2 fork
on your command line.
But what’s a system call? A way to communicate with the kernel of our operating system. System calls are the API of the kernel, if you will. We tell the kernel to do something for our us with system calls: reading, writing, allocating memory, networking, device management.
And fork is the system call that tells the kernel to create a new process. When
one process asks the kernel for a new process with fork(2)
the kernel splits
the process making the call into two. That’s probably where the name comes
from: calling fork(2)
is a “fork in the road” in the lifetime of a process.
As soon as the kernel returns control to the process after handling the system
call there now is a parent process and a child process. A parent can have a lot of
child processes, but a child process only one parent process.
And both processes, parent and child, are pretty much the same, right after the creation of the child. That’s because child processes in a Unix system inherit a lot of stuff from their parent processes: the data (the code it’s executing), the stack, the heap, the user id, the working directory, open file descriptors, the connected terminal and a lot more. This can be a burden (which is why copy-on-write is a thing) but also has some neat advantages — as we’ll see later.
So how do we use fork? Since (deep down) making a system call involves putting parameters and the unique identifier of the call in CPU registers (which ones may change depending on the architecture we’re working with) and firing a software interrupt, most programming languages provide wrappers that do all the work and allow us to not worry about which system call is identified by which number.
Ruby is no exception here and allows us to use fork(2)
with a method called,
well, fork
:
# fork.rb
child_pid = fork do
puts "[child] child_pid: #{child_pid}"
puts "[child] Process ID: #{Process.pid}"
puts "[child] Parent Process ID: #{Process.ppid}"
end
Process.wait(child_pid)
puts "[parent] child_pid: #{child_pid}"
puts "[parent] Process ID: #{Process.pid}"
What we’re doing here is calling fork
in Ruby and pass it a block. This will
create a new process, a child process, and run everything inside the block in
the new process and then exit. In the parent process we call Process.wait
and
pass it the return value of fork
, which is the ID of the child process. We
also need to wait for child processes to exit because otherwise they’d turn
into zombie processes. Yep, that’s a valid Unix rule right there: parent
processes need to wait for their children to die so they don’t turn into
zombies.
When we run this we’ll get this:
$ ruby fork.rb
[child] child_pid:
[child] Process ID: 29715
[child] Parent Process ID: 29695
[parent] child_pid: 29715
[parent] Process ID: 29695
As we can see, the child process has a new process ID and its parent process ID
matches the process ID printed in the parent process. And most interestingly
child_pid
is nil
inside the child process but contains a value in the
parent process. This is how we can check whether we are in the parent process
or the child process. Since the child inherits the data from the parent process,
both processes are running the same code right after fork
and we can decide
which process does what depending on the return value of fork
.
If we put a sleep
somewhere inside the block, run it again and use a tool like
ps
or pstree
we’d see something like this:
$ pstree | grep fork
| \-+= 29695 mrnugget ruby fork.rb
| \--- 29715 mrnugget ruby fork.rb
Two processes, one parent and one child, with different process IDs. Just by
calling fork
. That’s not too hard right? And it’s certainly not magic. So how
does Unicorn use fork
?
When Unicorn boots up it calls the
spawn_missing_workers
method, which contains
this piece of code:
worker_nr = -1
until (worker_nr += 1) == @worker_processes
WORKERS.value?(worker_nr) and next
worker = Worker.new(worker_nr)
before_fork.call(self, worker)
if pid = fork
WORKERS[pid] = worker
worker.atfork_parent
else
after_fork_internal
worker_loop(worker)
exit
end
end
So, what happens here? Unicorn calls this method with @worker_processes
set
to the number of workers we told it to boot up. It then goes into a loop and
calls fork
that many times. But instead of passing a block to fork
, Unicorn
instead checks the return value of fork
so see if its now executing in the
parent and in the child process. Remember: a forked process inherits the data
of the parent process! A child process executes the same code as the parent,
and we have to check for that in order to have the child do something else.
Passing a block to fork
does the same thing under the hood, but explicitly checking
the return-value of fork
is quite a common idiom in many Unix programs, since
the C API doesn’t allow passing blocks around.
If fork returned in the parent process, Unicorn saves the newly created
worker
object with PID of the newly created child process in the WORKERS
hash constant, calls a callback and starts the loop again.
In the child process another callback is called and then the child goes into its
main loop, the worker_loop
. If the worker loop should somehow return the child
process exits and is done.
And boom! We’ve now got 16 worker processes humming along, waiting for work in
their worker_loop
, just by going into a loop, doing some cleanup and calling
fork
16 times.
That’s not too hard, is it? So let’s go from fork
to another basic Unix
feature…
My guess is that most people even vaguely familiar with Unix systems know about pipes and have probably done something like this at one point or another in their lives:
$ grep ‘wat’ journal.txt | wc -l
84
Pipes are amazing. Pipes are a really simple abstraction that allows us to take the output of one program and pass it as input to another program. Everybody loves pipes and I personally think the pipe character is one of the most best features Unix shells have to offer.
But did you know that you can use pipes outside of the shell?
pipe(2)
is a system call with which we can ask the kernel to create a pipe
for us. This is exactly what shells are using. And we can use it too, without a
shell!
Remember the saying that under Unix “everything is a file”? Well, pipes are files too. One pipe is nothing more than two file descriptors. A file descriptor is a number that points to an entry in the file table maintained by the kernel for each running process. In the case of pipes the two file table entries do not point to files on a disk, but rather to a memory buffer to which you can write and from which you can read with both ends of the pipe.
One of the file descriptors returned by pipe(2)
is the read-end and the other
one is the write-end. That’s because pipes are half duplex – the data only flows
in one direction.
Outside of the shell pipes are heavily used for inter-process
communication. One
process writes to one end, and another process reads from the other end. How?
Remember that a child process inherits a lot of stuff from its parent process?
That includes file descriptors! And since pipes are just file descriptors,
child processes inherit them. If we open a pipe with pipe(2)
in a parent
process and then call fork(2)
, both the parent and the child process have
access to the same file descriptors of the pipe.
# pipe.rb
read_end, write_end = IO.pipe
fork do
read_end.close
write_end.write('Hello from your child!')
write_end.close
end
write_end.close
Process.wait
message = read_end.read
read_end.close
puts "Received from child: '#{message}'"
In Ruby we can use IO.pipe
, which is a wrapper around the pipe(2)
system call,
just like fork
is a wrapper around fork(2)
, to create a pipe.
And in this example we create a pipe with IO.pipe
and then create the child
process with fork
. Since just after the call to fork
both processes have
both pipe file descriptors we need to close the end of the pipe we’re not going
to need. In the child process that’s the read-end and in the parent it’s the
write-end.
We then write something to the pipe in the child, close the write-end and exit. The parent closes the write-end, waits for the child to exit and then reads the message the child wrote to the pipe. To clean up it closes the read-end. If we run this we get exactly what we expected:
$ ruby pipe.rb
Received from child: 'Hello from your child!'
That’s pretty amazing, isn’t it? Just a few lines of code and we created two
processes that talk to each other! By the way, this is the exact same concept a
shell uses to make the pipe-character work. It creates a pipe, it forks (once
for each process on one side of the pipe) then uses another system call
(dup2
) to turn the write-end of the pipe into STDOUT and the read-end into
STDIN respectively and then executes different programs which are now connected
through a pipe.
So how does Unicorn make use of pipes?
Unicorn uses pipes a lot.
First of all, there is a pipe between each worker process and the master
process, with which they communicate. The master process writes command to the
pipe (something like QUIT
) and the child process then reads the commands and
acts upon them. Communication between the master and its worker processes
through pipes.
Then there’s another pipe the master process only uses internally and not for IPC, but for signal handling. It’s called the “self-pipe” and we’ll have a closer look at that one later.
And then there’s the ready_pipe
Unicorn uses, which is actually quite an
amazing trick. See, if you want to daemonize a process under Unix, you need to
call fork(2)
two times (and do some other things) so the process is
completely detached from the controlling terminal and the shell thinks is the
process is done and gives you a new prompt.
What Unicorn does when you tell it to run as a daemon is to create a pipe,
called the ready_pipe
. It then calls fork(2)
two times, creating a grand
child process. The grand child process inherited the pipe, of course, and as
soon as its fully booted up and everything looks good, it writes to this pipe
that it’s okay for the grand parent to quit. The grand parent, which waited for
a message from the grand-child, reads this and then exits.
This allows Unicorn to wait for the grand child to boot up while still having a
controlling terminal to which it can write error messages should something go
wrong between the first call to fork(2)
and booting up the HTTP server in the
grand child. Only if the everything worked the grand child turns into a real
daemon process. Process synchronization through pipes.
That does come pretty close to being magic, yep, but this is just a really
clever use of fork(2)
and pipe(2)
.
At the heart of everything that has to do with networking under Unix are sockets. You want to read a website? You need to open a socket first. Send something to the logserver? Open a socket. Wait for incoming connections? Open a socket. Sockets are, simply put, endpoints between computers (or processes!) talking to each other.
There are a ton of different sockets: TCP sockets, UDP sockets, SCTP sockets, Unix domain sockets, raw sockets, datagram sockets, and so on. But there is one thing they all have in common: they are files. Yes, “everything is file” and that includes sockets. Just like a pipe, a socket is a file descriptor, from which you can read and write to just like with a file. The sockets API for reading and writing is deep down the same as the file API.
So, let’s say we are writing a server. How do we use sockets for that? The basic lifecycle of a server socket looks like this:
First we ask the kernel for a socket with the socket(2)
system call. We
specify the family of the socket (IPv4, IPv6, local), the type (stream,
datagram) and the protocol (TCP, UDP, …). The kernel then returns a file
descriptor, a number, which represents our socket.
Then we need to call bind(2)
, to bind our socket a network address and a
port. After that we need to tell the kernel that our socket is a server socket,
that will accept new connections, by calling listen(2)
. So now the kernel
forwards incoming connections to us. (This is the main difference between
the lifecycles of a server and a client socket).
Now that our socket is a real server socket and waiting for new incoming
connections we can call accept(2)
, which accepts connections and returns a new
socket. This new socket represents the connection. We can read from
it and write to it.
But here’s the thing: accept(2)
is a blocking call. It only returns if the
kernel has a new connection for us. A server that doesn’t have too many
incoming connections will be blocking for a long time on accept(2)
. This
makes it really difficult to work with multiple sockets. How are you going to
accept a connection on one socket if you’re still blocking on another socket
that nobody wants to connect to?
This is where select(2)
comes into play.
select(2)
is a pretty old and famous (maybe infamous) Unix system call for working
with file descriptors. It allows us to do multiplexing: we can monitor
several file descriptors with select(2)
and let the kernel notify us as soon
as one of them has changed its state. And since sockets are file descriptors too,
we can use select(2)
to work with multiple sockets. Like this:
sock1 = Socket.new(:INET, :STREAM)
addr1 = Socket.pack_sockaddr_in(8888, '0.0.0.0')
sock1.bind(addr1)
sock1.listen(10)
sock2 = Socket.new(:INET, :STREAM)
addr2 = Socket.pack_sockaddr_in(9999, '0.0.0.0')
sock2.bind(addr2)
sock2.listen(10)
5.times do
fork do
loop do
readable, _, _ = IO.select([sock1, sock2])
connection, _ = readable.first.accept
puts "[#{Process.pid}] #{connection.read}"
connection.close
end
end
end
Process.wait
That’s a 23-line TCP server, listening on two ports, with 5 worker processes accepting connections. Besides missing some minor things like HTTP request parsing, HTTP response writing and error handling it’s pretty much ready to ship.
No, but seriously, this actually does a lot of stuff in just a few lines with the help of system calls.
We create two sockets with Socket.new
, which somewhere deep down in Ruby
calls socket(2)
. Then we bind the sockets to two different ports, 8888 and
9999 respectively, on the local interface. Afterwards we call listen(2)
(hidden by the #listen
method) and tell the kernel to queue up 10 connections
at maximum for us to handle.
With our sockets ready to go we call fork
5 times, which in turn creates 5
child processes that all run the code in the block. So every child calls
IO.select
(which is the wrapper around select(2)
) with the two sockets as
argument. IO.select
is going to block and only return if one of the two sockets
is readable (on a listening socket that means that there are new connections).
And this is exactly why we use select(2)
here: with accept(2)
we would block
on one socket and miss out if the other socket had a new connection.
IO.select
returns the readable sockets in an array. We take the first one and
call accept(2)
on it, which is now going to return immediately. Then we just
read from the connection, close the connection socket and start our worker loop
again.
If we run this and send some messages to our server with netcat like this:
$ echo 'foobar1' | nc localhost 9999
$ echo 'foobar2' | nc localhost 9999
$ echo 'foobar3' | nc localhost 8888
$ echo 'foobar4' | nc localhost 8888
$ echo 'foobar5' | nc localhost 9999
Then we can see our server accepting the connections and reading from them:
$ ruby tcp_sockets_example.rb
[31605] foobar1
[31607] foobar2
[31605] foobar3
[31607] foobar4
[31609] foobar5
Each connection handled by a different child process. Load balancing done by the
kernel for us, thanks to select(2)
.
Before master process calls fork
to create the worker processes, it calls socket
,
bind
and listen
to create one or more listening sockets (yes, you can configure
Unicorn to listen on multiple ports!). It also creates the pipes that will be
used to communicate with the worker processes.
After forking, the workers, of course, have inherited both the pipe and the listening sockets. Because, after all, sockets and pipes are file descriptors.
The workers then call select(2)
as part of their worker_loop
with both
the pipe and the sockets as arguments. Now, whenever a connection comes in,
one of the workers’ call to select(2)
returns and this worker handles the
connection by reading the request and passing it to the Rack/Rails application.
And here’s the thing: since the workers call select(2)
not only with the sockets,
but also with the master-to-worker pipe, they’ll never miss a message from the
master while waiting for a new connection. And if there is a new connection,
they handle it, close it and then read the message from the master process.
That’s a really neat way to do load balancing through the kernel and to guarantee that messages to workers are not lost or delayed too long while the worker process is doing its work.
Let’s talk about signals. Signals are another way to do IPC under Unix. We can send signals to processes and we can receive them.
$ kill -9 8433
This sends the signal 9, which is the KILL
signal, to process 8433. That’s
pretty well-known and a lot of people have used this before (probably with
sweat running down their face). But did you know that pressing Ctrl-C
and
Ctrl-Z
in your shell sends signals too?
So what are signals? Most often they are described as software interrupts. If we send a signal to the process, the kernel delivers it for us and makes the process jump to the code that deals with receiving this signal, effectively interrupting the current code flow of the process. Signals are asynchronous — we don’t have to block somewhere to send or receive a signal. And there are a lot of them: the current Linux kernel for example supports around 30 different signals.
Sending signals is pretty good, and I’d bet we’ve all done it a bunch of times, but what’s really cool is this: we can tell the kernel how we want our process to react to certain signals. That’s called “signal handling”.
We have a few options when it comes to signal handling. We can ignore
signals: we can tell the kernel we don’t care about a signal and when the
kernel delivers an ignored signal to our process it doesn’t jump to any
specific code, but instead does nothing. Ignoring signals has one limitation
though: we can’t ignore SIGKILL
and SIGSTOP
, since there has to be a way
for an administrator to kill and stop a process, no matter what the developer
of that process wants it to do.
The second option is to catch a signal, effectively defining a signal
handler. If ignoring a signals means “Nope, kernel, don’t care about QUIT.”
then defining a signal action is telling the kernel “Hey, if I receive this
signal, please execute this piece of code here”. For example: a lot of Unix
programs do some clean-up work (remove temp files, write to a log, kill child
processes) when receiving SIGQUIT
. That’s done by catching the signal and
defining an appropriate signal handler, that does the clean-up work. Catching
signals has the limitations that ignoring signals has: we can’t catch SIGKILL
and SIGSTOP
.
We can also let the defaults apply. Each signal has a default action associated
with it. E.g. the default action for SIGQUIT
is to terminate the process and
make a core dump. We can let that one leave it as it is, or
redefine the signal action by catching it. See man 3 signal
on OS X or man 7 signal on Linux for a list of the
default actions associated with each signal.
So, how do we catch a signal? In Ruby it’s pretty simple:
# signals.rb
trap(:SIGUSR1) do
puts "SIGUSR1 received"
end
trap(:SIGQUIT) do
puts "SIGQUIT received"
end
trap(:SIGKILL) do
puts "You won't see this"
end
puts "My PID is #{Process.pid}. Send me some signals!"
sleep 100
We use trap
to catch a signal and pass it a block to define a signal action
that will be executed as soon as our process receives the signal. In this
example, we try to redefine the signal handler for SIGUSR1
, SIGQUIT
and
SIGKILL
. The sleep
statement gives us time to send the signals to our
process.
If we run this and then send signals to our process with the kill
command like
this:
$ kill -USR1 31950
$ kill -QUIT 31950
$ kill -KILL 31950
Then our process will output the following:
$ ruby signals.rb
My PID is 31950. Send me some signals!
SIGUSR1 received
SIGQUIT received
zsh: killed ruby signals.rb
As we can see, the kernel delivered all of the signals to our process. On
receiving SIGUSR1
and SIGQUIT
it executed the signal handlers, but, as I
said before, catching SIGKILL
proved useless and the kernel killed the process.
You can probably imagine what we can do with signal handlers. One of the most
common things to do with custom signal handlers, for example, is to catch
SIGQUIT
to do some clean-up work before exiting. But there are a lot more
signals and defining appropriate signal handlers can distinguish well-behaving
processes from rude ones. Example: if a child process dies the kernel notifies
the parent process by sending a SIGCHLD
. The default action is to ignore the
signal and do nothing, but a well-behaving application would probably wait
for the child, clean up after him and write something to a log file.
Unicorn sets up a lot of different signal handlers
in the master process, before it calls fork
and spawns the worker processes.
These signal handlers do a lot of things. Here are a few examples:
These signal handlers are like a separate API through which you tell the master and worker processes what to do. And it’s pretty reliable too, considering the fact that signals are essentially asynchronous events and can be sent multiple times. This just screams for race-conditions and locks. So how does Unicorn do it?
Unicorn uses a self-pipe to manage its signal actions. The pipe the master process sets up is this self-pipe, which it will only use internally and not to talk to other processes. It also sets up a queue data structure. After that come the signal handlers. Unicorn catches a lot of signals, as we saw, but each signal handler doesn’t do much. It only pushes the signal’s name into the queue and sends one byte through the self-pipe.
After setting up the signal handlers, spawning worker processes, and so on, the
master process goes into its main loop, in which it checks upon the workers
regularly and sleeps in between. But it doesn’t just sleep
, no, the master
process actually goes to sleep by calling select(2)
on the self-pipe, with a
timeout as argument. This way it can go to sleep but will be woken up as soon
as a signal arrived, since the signal handler just send a byte through the
pipe, turns it into a readable pipe (from the master’s perspective) and
select(2)
now returns. After waking up, the master just has to pop off a
signal from the queue it set up in the
beginning and handle the signals one after another. This is of tremendous value
if you consider again that signals are asynchronous and you’ll never know what
you’re currently executing when a signal arrives, and that they can be sent
multiple times — even if you’re currently executing your signal handler code.
Using a queue and a self-pipe in this combination makes handling signals a lot
saner and easier.
Worker processes, on the other hand, inherit the master’s signal handlers –
again: child processes inherit a lot from their parents. But instead of leaving
them as they are, the workers redefine (most of) the signal handlers to be
no-ops. They get their signals through the pipe which connects them to the
master process. If the master process, for example, receives SIGQUIT
it
writes the name of the signal to each pipe connected to a worker process to
gracefully shut them down. The worker processes call select(2)
on this
master-worker pipe and the listening sockets, which means that as soon as they
finish their work (or don’t have anything to do) they will read the signal name
from the pipe and act upon it. This “signal delivery from master to worker via
pipe”-mechanism avoids the many problems that can occur if a worker process
should receive a signal while currently working of a request.
By now we have looked at fork(2)
and how easy it is to spawn a new process.
We saw that we can use pipes pretty easily outside a shell and without any use
of the pipe character by calling pipe(2)
and just working with the two file
descriptors as if they were files. We also created sockets, worked with
select(2)
, looked at a pre-forking TCP server in 23 lines of Ruby and had
the kernel of our operating system do our load balancing for us. Then we saw
that Unicorn has its own API composed of signals and that it’s not that hard to
work with signals.
These were just some basic Unix concepts. Trivial on their own, powerful when combined.
So, let’s have a closer look at these features of Unicorn that amazed me so much, that I was sure were created by some wizards with long robes and tall hats, in a basement far, far away, on old rusty PDP-11s.
Let’s see how this “magic” is just Unix.
If we put preload = true
in the configuration file, Unicorn will “preload”
our Rack/Rails application in the master process to spare the worker process
from doing it themselves. As soon as the application is preloaded, spawning off
a new worker process is really, really fast, since the workers don’t have to
load it anymore.
The question is: how does this work exactly? Let me explain.
Right after Unicorn has evaluated command line options, it
builds a lambda called app
.
This lambda contains the instructions needed to load our Rack/Rails application
into memory. It loads the config.ru
file (or uses default settings) and then
creates a Rack application with Rack::Builder
, on which it calls #to_app
.
So what should come out of the lambda is a Rack application in which we just
need to call #call
to pass it a request and get a response. But since lambdas
are evaluated only as soon as they are called, this doesn’t happen when the
lambda is defined.
Unicorn passes this app
lambda on to the Unicorn::HttpServer
, which
eventually calls fork(2)
to spawn the worker processes. But before it creates
a new process, the HttpServer
checks if we told Unicorn to use preloading. If
we did, only then it calls the lambda. If we
didn’t, the workers would each call the lambda after the call to fork(2)
.
Calling the lambda, which hasn’t been called before, now loads our application into memory. Files are being read, objects are created, connections established – everything is somehow getting stored in memory.
And here comes the real trick: since the master loaded the application into
memory, which can take some time if we’re working with a large Rails
application, the worker processes inherit it. Yep, the worker processes inherit
our application. How neat is that? Since workers are created with fork(2)
they already have the whole application in memory as soon as they are created.
Preloading is just deciding if the Unicorn calls a lambda before or after the
call to fork(2)
. And if Unicorn called it before, creating new worker
processes is really fast, since they are basically ready to go right after
creation, except for some callbacks and setup work.
With copy-on-write, which works in the Ruby VM since 2.x, this is even faster. The reason is that “inheriting” involves copying from the parent’s to the child’s memory address space. It’s probably not as slow as you imagine, but with copy-on-write only the memory regions which the child process wants to modify are copied.
And the best part of it is this: the kernel is doing all the work for us. The
kernel answers the call to fork(2)
and the kernel copies the memory. We just need
to decide when to create our objects: before or after the call to fork(2)
.
This comes in really handy when we now look at another great feature of Unicorn.
Unicorn allows us to increase and decrease the number of its worker processes by sending two signals to the master process:
$ kill -TTIN 93821
$ kill -TTOU 93821
These two lines add and then remove a new worker process. The signals used,
SIGTTIN
and SIGTTOU
, are normally sent by our terminal driver to notify a
process running in the background when it’s trying to read from (SIGTTIN
) or
write to (SIGTTOU
) the controlling terminal. Since Unicorn doesn’t allow not
using a logfile when running as a daemon, this shouldn’t be an issue, which
means that Unicorn is free to redefine the signal actions (the default for both
signals is to stop the process).
It does so by defining signal handlers for SIGTTIN
and SIGTTOU
that, as we
saw, only add the name of the signal to the signal queue and write a byte to
the self-pipe to wake up the master process.
The master process, as soon as it wakes up from its main-loop sleep, sees the
signals and increases or decreases the internal variable worker_processes
,
which is just an integer. And right before it goes back to sleep, it calls
#maintain_worker_count
, which either spawns a new worker or writes SIGQUIT
to
the pipe connected to the now superfluous worker process to gracefully shut it down.
So let’s say we send SIGTTIN
to Unicorn to increase the number of workers.
What will happen is that the master wakes up (triggered by the write to the
self-pipe), increases worker_processes
and calls #maintain_worker_count
,
which in turn will call another method called #spawn_missing_workers
. Yes,
that’s right. We looked at this method before, its the same one that’s used to
spawn the worker processes when booting up. In its entirety it looks like this:
def spawn_missing_workers
worker_nr = -1
until (worker_nr += 1) == @worker_processes
WORKERS.value?(worker_nr) and next
worker = Worker.new(worker_nr)
before_fork.call(self, worker)
if pid = fork
WORKERS[pid] = worker
worker.atfork_parent
else
after_fork_internal
worker_loop(worker)
exit
end
end
rescue => e
@logger.error(e) rescue nil
exit!
end
Again, this is just a loop that calls fork(2)
N times. Now that N is
increased by one, a new worker process will be created. The other calls to
fork
are skipped by checking whether WORKERS
already contains an instance
of Worker
with the same worker_nr
.
Take note of worker_nr
here, it is important. All worker processes have a
worker_nr
by which they are easily identified in the row of spawned
processes.
If we now send SIGTTOU
to the master process, the following is going to
happen. First of all, the master is woken up by a fresh byte on the self-pipe.
Instead of increasing worker_processes
now, it decreases it. And again, it
calls #maintain_worker_count
, which doesn’t jump straight to
#spawn_missing_workers
. Since no worker process is missing,
#maintain_worker_count
now takes care of reducing the number of workers:
def maintain_worker_count
(off = WORKERS.size - worker_processes) == 0 and return
off < 0 and return spawn_missing_workers
WORKERS.each_value { |w| w.nr >= worker_processes and w.soft_kill(:QUIT) }
end
It may not be idiomatic Ruby, but these 3 lines are still fairly easy to
understand. The first line generates the difference between the number of
currently running worker processes and returns if it’s zero. If the difference
is negative, a new worker will be spawned (which is where the path of SIGTTIN
ends in this method). But since the difference is positive after decreasing
worker_processes
, the master process now takes the workers with a worker_nr
that’s too high and calls soft_kill(:QUIT)
on the worker instance.
This in turn sends the signal name through the pipe to the corresponding worker
process, which will catch that signal through select(2)
and gracefully shut
down.
After this, the master process calls Process.waitpid
(which in turn calls
waitpid(2)
), which returns the PID of dead children (and doesn’t leave them
hanging as zombies). The worker process with this PID now just needs to be
removed from the WORKERS
hash and Unicorn is ready to go again.
All of this is pretty simple: fork(2)
in a loop, pipes, signal handlers and
keeping track of numbers. Again: it’s the combination of that makes these Unix
idioms so powerful.
The same can be said for my favorite Unicorn feature.
This fantastic feature has many names: hot reload, zero downtime deployment, hot swapping and hot deployment. It allows us to deploy a new version of our application, while the old one is still running.
With Unicorn “hot reload” means, that we can spin up a new master process, with new worker processes serving a new version of our application, while the old master process is still running and still handling requests with the old version.
It’s all triggered by sending a simple SIGUSR2
to the master process. But how?
Let’s take a step back and say that our Unicorn master and worker processes are
just humming along. The master process is sleeping, waking up, checking up on
the workers and going back to sleep. The worker processes are handling requests
without a care in the world. Suddenly a SIGUSR2
is sent to the master
process.
Again, the signal handler catches the signal, pushes the signal onto the signal
queue, writes a byte to the self-pipe and returns. The master wakes up from its
main-loop-slumber and sees that it received SIGUSR2
. Straight away it calls
the #reexec
method. It’s a fairly long method
and you don’t have to read through it now. But most of “hot reload” is
contained in it, so let’s walk through it.
The first thing the method does it to check if the master process is already
reexecuting (reexecuting means that a new master process is started by an old
one). If it is, it returns and its job is done. But if not, it writes the
current PID to /path/to/pidfile.pid.oldbin
. .oldbin
stands for “old
binary”. With the PID saved to a file, the master process now calls fork(2)
,
saves the returned PID of the newly created child process (to later check if
it’s already reexecuting…) and returns. The old master process adds “(old)”
to its process name (by changing $0
in Ruby) and is now done with #reexec
.
But since a process created with fork(2)
is executing exactly the same code,
the new child process goes ahead with #reexec
.
Right after the call to fork(2)
the child writes the numbers of the sockets
it’s listening on (remember: sockets are files, files are represented as file
descriptors, which are just numbers) to an environment variable called
UNICORN_FD
as one string, in which the numbers are separated by commas. (Yes,
it keeps track of listening sockets by writing to an environment variable. Take
a deep breath. It’ll make sense in a second.)
Afterwards it modifies the listening sockets so they stay open by setting the
FD_CLOEXEC
flag on them to false.
It then closes all the other file descriptors it doesn’t need (e.g.: sockets and files opened by the Rack/Rails application).
With all preparations and cleaning done, the child process now calls execve(2)
.
The execve(2)
system call turns the calling process into a completely
different program. Which program it’s turned into is determined by the
arguments passed to execve(2)
: the path of the program, the arguments and
environment variables. This is not a new process we’re talking about: the new
program has the same process ID, but its complete heap, stack, text and data
segments are replaced by the kernel.
This is how we can spawn new programs on a Unix system and what every Unix
shell does when we try to launch Vim: it calls fork(2)
to create
a child process and then it calls execve(2)
with the path to the Vim
executable. Without the call to execve(2)
we’d end up with a lot of copies of
the original shell process when trying to start programs.
That’s also why Unicorn needs to set the FD_CLOEXEC
flag to false on the
sockets before it calls execve(2)
. Otherwise the sockets would get closed,
when the of the process is being replaced.
Unicorn calls execve(2)
with the original command line arguments it was
started with (it keeps track of them), in effect spawning a fresh Unicorn
master process that’s going to serve a new version of our application. Except
that it’s not completely fresh: the environment variables the old master
process set (UNICORN_FD
) are still accessible by the new master process.
So the new master process boots up and loads the new application code into
memory (preloading!). But before it creates worker processes with fork(2)
, it
checks the UNICORN_FD
environment variable. And it finds the numbers of our
listening sockets! And since file descriptors are just numbers, it can work
with them. It turns them into Ruby IO
objects by calling IO.new
with each
number as an argument and has thereby recovered its listening sockets.
And now it calls fork(2)
and creates worker processes which inherit these
listening sockets again and can start their select(2)
and accept(2)
dance
again, now handling requests with the new version of our application.
There is no “address already in use” error bubbling up. The new master process inherited these sockets, they are already bound to an address and transformed into listening sockets by the old master process. The new master process and its workers can work with them in the same way the worker processes of the old master process do.
Now there are two sets of master and worker processes running. Both are handling incoming connections on the same sockets.
We can now send SIGQUIT
to the old master process to shut it down and as soon
as it exits the new master process takes over and only our new application
version is being served. And all of this happened without the old worker
processes stopping their work once.
All of this is just Unix. The master-worker architecture, the signal handling, the communication through pipes, the preloading, the scaling of workers with signals and the hot reloading of Unicorn. There is no magic involved.
I think that’s the most amazing part about all of this. The combination of
concepts like fork
, pipe
and signals, that are easy to understand on their
own, and leveraging the operating system is where the perceived magic and
ultimately the power of great Unix software like Unicorn comes from.
You might be thinking: “Why? Why should I care about this low-level stuff? I
build web applications, why should I care about fork
and select
?
I think there are some really compelling reasons.
The first one is debugging. Have you ever wondered why you shouldn’t open a
database connection (a socket!) before Unicorn calls fork(2)
? Or why you get
a “too many open files” error when you try to make a HTTP request (sockets!)?
Now you know why.
Knowing how your system works on each layer of the stack is immensely helpful when trying to find and eliminate bugs.
The next reason I call the design and architecture reason and boils down to having answers to questions like these: should we use threads or processes? How could these processes talk to each other? What are the limitations? What are the benefits? Will this perform? What’s the alternative?
With some understanding of your operating system and the APIs it offers, it’s far easier to make architectural decisions and design choices when building a system or single components of it.
One more level of abstraction. Someone somewhere at some time said that “it’s always good to know one more level of abstraction beneath the one you’re currently working on” and I totally agree.
I like to think, that learning C made me a better Ruby programmer. I suddenly knew what was happening behind the curtains of the Ruby VM. And if I didn’t know, I could make a good guess.
And I think that knowing deeply about the system to which I deploy my (web) application makes me a better developer, for the same reasons.
But the most important reason for me, which is a personal one, is the realization that everything Unicorn does is not magic! No, it’s just Unix and there is no secret ingredient. Which, in turn, means that I could write software like this. I could write a webserver like this! Realizing this is worth a lot.
]]>Fork()
function which you can call
directly in your code. The first time I read through it I was wondering and
saying to myself: “Yeah, why is there no Fork()
? It surely can’t be that hard
to implement.” After all you can already call system calls with the
syscall package. As I read more and more I realized that the
problem is not implementing Fork()
per se, but rather implementing Fork()
to work safely in a multi-threaded environment, which most Go programs are. So
I tried to find out why.
And it turns out that the problem stems from the behaviour of fork(2)
itself.
Whenever a new child process is created with fork(2)
the new process gets a
new memory address space but everything in memory is copied from the old process
(with copy-on-write that’s not 100% true, but the semantics are the same).
If we call fork(2)
in a multi-threaded environment the thread doing the call
is now the main-thread in the new process and all the other threads, which ran
in the parent process, are dead. And everything they did was left exactly as it
was just before the call to fork(2)
.
Now imagine that these other threads were happily doing their work before the
call to fork(2)
and a couple of milliseconds later they are dead. What if
something these now-dead threads did was not meant to be left exactly as it
was?
Let me give you an example. Let’s say our main thread (the one which is going
to call fork(2)
) was sleeping while we had lots of other threads happily
doing some work. Allocating memory, writing to it, copying from it, writing to
files, writing to a database and so on. They were probably allocating
memory with something like malloc(3)
. Well, it turns out that malloc(3)
uses a mutex internally to guarantee thread-safety. And exactly this is the problem.
What if one of these threads was using malloc(3)
and has acquired the lock of
the mutex in the exact same moment that the main-thread called fork(2)
? In
the new child process the lock is still held - by a now-dead thread, who will
never return it.
The new child process will have no idea if it’s safe to use malloc(3)
or not. In
the worst case it will call malloc(3)
and block until it acquires the lock,
which will never happen, since the thread who’s supposed to return it is dead.
And this is just malloc(3)
. Think about all the other possible mutexes and
locks in database drivers, file handling libraries, networking libraries and so
on.
In order to call fork(2)
in a safe way the calling thread would need to be
absolutely sure that all the other threads are to fork too. And this is hard,
especially if you’re going to implement a wrapper around fork(2)
in a library
and have no idea what’s going to be happening all around you.
If the new child process is going to be turned into a different process with
execve(2)
the problem is not that big, since the heap, stack and data will be
replaced. That’s why there is a os.StartProcess() in Go,
which uses fork(2)
under the hood (see line 65
here). There is still the problem of open file
descriptors, which the new child process will inherit but were intended to be
used only a now-dead thread. But it’s still possible to close them up, since
the new child process would have direct access.
Now you might realize that the title of this post is a lie, since threads can fork. But in practice it’s really hard to pull off, which explains why the Go issue mentioned at the beginning is nearly 5 years old.
There are of course a couple of attempts to provide a solution.
[pthread_atfork(3)
][http://linux.die.net/man/3/pthread_atfork] allows users
to register handlers in threads to be called right before and after fork. But
as you can imagine, this can be cumbersome too. Solaris has forkall(2)
, which
does not kill the non-forking-threads but keeps them alive and doing exactly
what they did before. This behaviour comes with its own share of problems:
if a thread calls
forkall()
, the parent thread performing I/O to a file is replicated in the child process. Both copies of the thread will continue performing I/O to the same file, one in the parent and one in the child, leading to malfunctions or file corruption.
To conclude: yes, the title is a lie, and yes, you can fork(2)
in a
multi-threaded environment, but it is really, really difficult to pull off
safely. So let’s just say that threads can’t fork and leave it at that.