Thorsten Ball

Home Books (Update!) Posts Talks Projects Contact

I wrote a second book!

It's called Writing A Compiler In Go and is the sequel to Writing An Interpreter In Go.

It's a sequel in code and in prose. We're picking up right where we left off and write a compiler and a virtual machine for Monkey. Just like before, the focus is runnabled and tested code, built from the ground up, step by step. Only this time, we're compiling and executing bytecode. It's the next step in Monkey's evolution.

Find out more at compilerbook.com

The Tools I Use To Write Books

04 Sep 2018

In the beginning, there is always a single text file, nothing more. It’s called ideas.md or book.md. It contains a list of thoughts and ideas, an outline. Everything else grows from there. It only makes sense that we start by talking about files.

The Files

Both of my books, Writing An Interpreter In Go and Writing A Compiler In Go, are written in GitHub Flavored Markdown (GFM). One file per chapter and all files under version control using git.

I only use a basic set of Markdown features in my texts: headings, emphasis, lists, links, images, quotes. And fenced code blocks. This last one is the most important one to mention here, because every piece of code presented in the books is contained in the Markdown files in the form of fenced code blocks.

Yes, that has all the drawbacks you imagine it to have. While I have syntax highlighting for fenced code blocks, editing them is not as comfortable as if they were their own files. But, most importantly, the code is also duplicated: one version lives in a Markdown file and one (or more) lives in the code folder that comes with the book. If I want to update a snippet of code presented in the book, I have to manually update every copy of it. Yes, cumbersome.

But there is one undeniable advantage to this approach: it works and it works exactly like I want it to. There are quite a few tools out there to embed code in Markdown files but none of them allow me to present a change to a piece of code.

Since we – you, the reader, and me, the writer – work on a single codebase in both books, we often have to extend or modify existing code. To show these changes I comment out the already existing parts of a method and just show what’s been added or changed. Like this:

// compiler/compiler.go

func (c *Compiler) emit(op code.Opcode, operands ...int) int {
  // [...]

  pos := c.addInstruction(ins)
  return pos
}

I don’t know of an existing tool that can do that. They either embed portions of or a complete file. And, yes, that file could be a *.diff, but even that would have to be generated separately and beforehand. So I went with fenced code blocks.

And believe me, I was this close to writing my own tool. A preprocessor that would not only allow me to embed auto-generated diffs into Markdown but also to run commands on a set of changes and embed the generated output, too.

What kept me from doing that was a calm voice in my head telling me that I’m here to write a book, not a preprocessor. And since copying code into Markdown files is only cumbersome once you have to go back and edit the code, but actually quite comfortable while writing, I just kept on doing that, ignoring the other voices.

Now I have written two books and zero tools, which I consider a success.

The Pipeline

Of course I do not send plain text files out to readers. Instead, they receive nicely formatted PDF, ePub, Mobi and HTML files, which I create with only a tiny number of tools: pp, pandoc and KindleGen. Together they form a pipeline:

First, the Markdown files are piped through pp, a generic preprocessor for text files that can do a lot of things, but which I only use to replace two variables in the text: the URL of the zipped code folder readers can download and the current version of the book.

After that, the resulting Markdown is handed over to pandoc, the most important part of this pipeline.

Here’s the shortest possible description of what Pandoc does: it takes text in one format and outputs it in another format. Markdown goes in, HTML comes out. Or turn it around and put in HTML and get Markdown back. Or feed it Markdown and get DOCX, or ODT, or PDF, or AsciiDoc, or any other of the myriad of supported formats.

In my pipeline, Pandoc takes the Markdown files of the book and, with a little bit of YAML containing meta data, turns them into PDF, HTML and ePub files. The default output is already nice to look at, but I have a custom template for each of these three formats, all of which are based on Pandoc’s default templates.

Since the HTML output is a single file with CSS in the <head> it’s easy to style. The same goes for ePub, which is really just a ZIP archive containing HTML files and is probably the one I styled the less, because I think it looks pretty good by default.

PDF generation, though, is done using LaTeX and requires a template written in LaTeX. I’ve stitched mine together from Pandoc’s default template and what Stack Overflow, hours of trial and error and the enlightenment and horror that was “holy shit, did you know LaTeX has its own package manager?” have given me. I like to touch it only when absolutely necessary. In the end, though, that doesn’t matter much.

What comes out looks beautiful to me and Pandoc is, without any doubt and exaggeration, one of the best tools I’ve ever used. It does exactly what it promises to, its documentation is stellar, it’s actively and carefully maintained and never once let me down. If I would have to shorten this post to one word, it would be “Pandoc”.

The only thing Pandoc can’t do is produce Mobi files, which is what Amazon uses for their Kindle eBook readers and store. For that, I use Amazon’s own command line tool KindleGen, which turns the ePub produced by Pandoc into a Mobi file. No styling or templates required.

Once the final files fall out of the pipeline I bundle them in a ZIP file, together with a folder that contains all of the code presented in the books. Ready to be published.

Publishing

I self-publish both books in both editions, eBook and paperback. Self-publishing means that instead of a publisher I have to take care of selling, printing and distributing the books to readers.

While I could theoretically run my own shop on which I sell the books, I don’t want to. I want to write books, not a web application for selling books, especially not one that involves the handling of taxes for an international audience. So instead I use two services to take care of that for me.

The first one is Gumroad, which I use to sell and distribute the eBook editions. I upload my ZIP, Gumroad accepts payment via PayPal or credit card and then sends the file to the reader – in exchange for a rather small fee. It also takes care of collecting taxes for me and I can set the price without any limitations, refund customers, send out free updates and create promo codes. After nearly two years, I’m still a happy customer and the only two features I’d love to have are more payment methods and pricing per country, so I can set a lower price for readers in India, for example.

The paperback editions are sold, printed on demand and shipped by Amazon Kindle Direct Publishing (KDP). I upload a print-ready cover and PDF version of a book and Amazon turns it into a paperback that you can purchase in seven different Amazon stores. Createspace is what I previously used for that, but after Amazon bought Createspace, they started to move the Createspace functionality over to KDP. By now, I’ve completely switched over and only use KDP. One less tool to worry about, since I was using KDP anyway to publish the Kindle version of the books on the Kindle stores.

For someone like me, a person who starts to sweat when we he hears “CMYK”, “RGB” and “you need to change your file” in one sentence, creating print-ready artifacts can be a bit of a hassle, but using LaTeX for the PDF generation comes in quite handy here. In a separate LaTeX template I use with Pandoc I can set the dimensions and margins of the document to exactly what I need and LaTeX takes care of the rest.

Readers can then purchase my books just like any other product on Amazon, including Prime shipping, refunds and all the payment methods accepted by Amazon. The downside of all this is a loss of control for me. I can’t, for example, offer personalized coupon codes nor can I bundle the paperback with the eBook edition.

I still think it’s worth it. When you upload a PDF file on Friday and then hold the paperback version of that file in your hands on Wednesday, you quickly forget about wrestling with color models of PDFs and start to grow convinced that we’re living in the future.

The Most Important Bit

That’s it. That’s the complete journey, from bytes in a text file to ink on paper or a ZIP in your inbox.

But here’s the most important bit, saved for last: none of this matters if you want to write a book. Quite a few people have told me that they want to write a book, but they’re not sure about which tools to use. My advice: all you need to write a book is a program that allows you to write text into a file.

Tools are only important to the process of writing a book in that they should get out of your way. You shouldn’t have to worry about how to put text in a file, only what text. Once you can do that comfortably – you know, with autosaving and the ability to edit effortlessly – keep on doing it. And then, keep doing it. Once you have something you’d be happy to publish, you can start to worry about tools.

Follow me on twitter: @thorstenball. Or send me an email to me@thorstenball.com. Or check out my books at interpreterbook.com and compilerbook.com.

I also maintain a mailing list, on which I sent out occasional updates about my books or this blog. I won't spam you and you can unsubscribe at any time.

Keep me updated!