Breeze: A Compilers-and-Interpreters Rabbit Hole
What happens when an OT-cyber engineer reads too many books about parsers
I am a cybersecurity engineer by trade. I spend my days worrying about PLCs, network segmentation, pondering when SOCI will require us to meet MIL2 or SP2 and worrying about how many firewalls we need to have between our corporate network and the DCS (hint: lots!). The closest my day job has come to language design lately is helping a fellow engineer work out a regex for a cribl route filter.
And yet, for the last several months I’ve been steadily working my way through a short but formidable reading list:
- Writing a C Compiler — Nora Sandler
- Crafting Interpreters — Robert Nystrom
- Writing an Interpreter in Go — Thorsten Ball
- Writing a Compiler in Go — Thorsten Ball
These are the books. If you’ve poked around this corner of the internet, you already know
them. Sandler takes you from return 2; to a genuine x86-64 compiler in C, one
excruciatingly well-argued chapter at a time. Nystrom builds two complete implementations
of the same small language (“Lox”): a tree-walking interpreter and a bytecode VM, and
somehow makes both feel like the most natural thing in the world. Ball’s two “Writing an
Interpreter/Compiler in Go” books are the Go-flavoured siblings — shorter, tighter, great
for a weekend each.
I don’t think there’s a better curriculum for self-teaching compilers out there. Each book scratches a different itch, and together they give you the full-stack tour: lexical analysis, parsing, ASTs, tree-walking evaluation, bytecode, stack machines, register allocation, the lot.
Reading those books put me in that dangerous headspace where everything you look at starts to resemble a tokenizer.
So I built a thing
Let me say the important part up front: what I’m about to describe is a learning exercise, not a language I’m promoting. The world does not need another language. The world, arguably, needs fewer languages. I built this to find out, concretely and with my own hands, which of the ideas in those books I actually understood versus just thought I understood. Nothing sharpens the distinction like trying to generate working code from an AST at one in the morning.
The project is called breeze. It’s a
CoffeeScript-flavoured language that compiles to Lua 5.1. It lives in a single
roughly-1,200-line Lua file. There are no dependencies. You run it with lua breeze.lua file.bz and it spits out Lua. That is the whole trick.
Why CoffeeScript flavour? Why Lua target?
The CoffeeScript-ness was a deliberate self-imposed constraint. CoffeeScript has, for better and worse, a very opinionated syntax: significant whitespace, arrow functions, string interpolation, implicit returns, classes, list comprehensions, postfix conditionals. That’s a dense cluster of features that each have to be handled by the lexer, the parser, or the code generator — sometimes all three. If I picked something simpler, I’d skip the hard parts. If I picked something wildly different, I’d spend three weekends on parser engineering before writing a single interesting line of breeze code.
The Lua target was partly pragmatism and partly homage. Lua 5.1 has a beautifully
small surface area — no classes, no exceptions, no pattern matching, no list
comprehensions — so every CoffeeScript feature I wanted to support translated into a
genuine code-generation problem rather than a one-line pass-through. Want classes?
You have to implement them with tables and metatables. Want try/catch? You wrap it
in pcall. Want safe navigation (?.)? You emit a careful little IIFE that
short-circuits on nil. Each feature is a tiny puzzle, and that’s exactly what I wanted
to practise. On top of all that, Lua was made in Brazil, just like me!!!
Plus — and this matters for the day job — Lua is everywhere in the security world: Wireshark dissectors, Nmap scripting engine, nDPI, Suricata, Redis, World of Warcraft, most game engines worth modding, and half the microcontrollers on your bench. Any tool that lets you write cleaner Lua has a real place in that ecosystem, even if “that tool” is a weekend project for one guy in Perth.
A quick tour
Here’s roughly what the language looks like. All of these compile to plain Lua 5.1; nothing below requires a runtime.
Arrow functions with implicit returns and string interpolation. The two CoffeeScript staples I reach for most:
1greet = (name, greeting = "G'day") -> "#{greeting}, #{name}!"
2print greet("Mabel")
Classes with inheritance and fat-arrow self binding. The fat arrow (=>) was the
feature I most enjoyed getting right — it desugars into (function(self, ...) return function(...) ... end end)(self) style shenanigans to preserve self inside callbacks:
1class Timer
2 constructor: (@label) ->
3 @elapsed = 0
4
5 tick: =>
6 @elapsed += 1
7 print "#{@label}: #{@elapsed}s"
Safe navigation and null coalescing. Two features Lua programmers have been
faking with and/or chains for decades:
1city = user?.profile?.address?.city
2label = device.name ?? "unnamed"
Pipeline operator. Steal good ideas shamelessly:
1result = raw_logs
2 |> filter((row) -> row.severity == "alert")
3 |> map((row) -> row.src_ip)
4 |> unique()
List comprehensions with ranges. Oh, how I missed these in plain Lua:
1evens = [x for x in 1..100 if x % 2 == 0]
2grid = [{x: i, y: j} for i in 1..5 for j in 1..5]
Postfix conditionals. Reads like English. Compiles to a boring if block:
1print("warning: disk nearly full") if disk.free_pct < 10
That’s not a complete tour — the full reference is in the repo — but it’s a fair sample of the syntactic space. Each of those lines compiles to Lua that would be perfectly legible to anyone who already writes Lua; the breeze version is just shorter and (to my eye at least) easier to read six months later.
A look under the hood
The compiler is structured the way the textbooks tell you to structure compilers:
- Lexer (
Breeze.lex) — turns source text into tokens. The interesting parts are significant-indentation tracking (emitting syntheticINDENTandDEDENTtokens when the indent level changes) and handling#{expr}interpolation inside strings, which is genuinely fiddly because the interpolation can itself contain strings. - Parser (
Breeze.parse) — recursive-descent over the token stream, building an AST of plain Lua tables tagged with akindfield. - Code generator (
Breeze.compile) — walks the AST and emits Lua source. Tracks local-variable scope so the right things getlocal-declared. Emits IIFEs for constructs that don’t have a direct Lua analogue.
Here’s a representative slice of breeze.lua — the bit that handles the fat arrow,
compressed a little for readability. This is the kind of thing you find yourself writing
at 11pm when you swore you were going to bed at 10:
1-- compile a function literal; `bind_self` is true for `=>`, false for `->`
2local function emit_function(node, state)
3 local params = node.params or {}
4 if node.bind_self then
5 table.insert(params, 1, "self")
6 end
7 local body = emit_block(node.body, state)
8 -- last expression of an arrow function becomes an implicit return
9 body = add_implicit_return(body)
10 return string.format(
11 "function(%s)\n%s\nend",
12 table.concat(params, ", "),
13 body
14 )
15end
The pattern that recurs all through the compiler is: pick a desugaring, emit Lua that
behaves identically, document the transform. I wrote a separate
REFERENCE.md for the
language partly to have real documentation and partly because writing reference docs
forces you to notice your own inconsistencies.
Where I’d actually use it
The use cases I keep circling back to aren’t web apps — they’re the places Lua already lives and where cleaner syntax would genuinely earn its keep:
- Wireshark dissectors for industrial protocols (Modbus, DNP3, OPC UA over some transports). The standard Lua dissector template is verbose in a way that really isn’t helping anyone understand what the dissector is doing. Significant whitespace plus fat-arrow callbacks makes the “read a field, build a tree item, advance the offset” loop read the way it feels in your head.
- Nmap NSE scripts. Same argument. The lift isn’t large but the readability win over time adds up.
- ESP32 / NodeMCU firmware. I have a pile of half-finished microcontroller projects where the Lua side is mostly “set up a pin, wait for an event, send an MQTT message”. The boilerplate-to-logic ratio drops nicely under breeze.
I don’t realistically expect anyone else to pick this up — and I’d be a bit suspicious if they did, because it’s a one-person project I mainly built to learn. But I do genuinely use it on my own bench, and it has earned its keep there.
On doing this with an AI pair-programmer
A short meta-note, because it’s relevant to how this got built. I wrote breeze with Claude Code as a constant pair-programming partner. The book reading was mine; the compiler design decisions were mine; the language-feature choices were mine. But the act of typing out a recursive-descent parser and catching the five hundredth off-by-one error in the indent tracker is a much better experience when there’s a patient agent next to you willing to read your tokenizer state-machine out loud and point out the case you missed.
The books taught me the concepts. Claude helped me ship an artefact that let me prove I actually held the concepts in my head. Those are different skills, and I’m grateful to have both.
What I actually learned (as opposed to what I thought I’d learn)
A few things surprised me, in rough order of how surprised I was:
- Lexers are mostly boring until they aren’t. Ninety percent of a lexer is a
boring state machine. The last ten percent — string interpolation, significant
whitespace, telling
#as comment from#as length operator from#in interpolation — is all the interesting thinking, and it arrives late. - Parsers feel powerful, code generators feel humbling. Writing the parser felt like wielding a very sharp tool. Writing the code generator felt like being handed a toddler and told “now translate everything this toddler wants into a formal language the toddler has never heard of.” That’s the real craft.
- CoffeeScript’s design was more principled than I remembered. When you actually
try to implement its corners — the implicit return rules, the distinction between
->and=>, the way@foobinds in constructor parameters — you notice how much thought went into each of them. My own additions (the pipeline operator, safe navigation, null coalescing) feel grafted on by comparison, which is fair: they are grafted on. - Error messages are the real product. I haven’t got nearly as far into this as I’d like. A teaching language with bad error messages is a worse teaching language than a production language with good ones. Next iteration.
Closing
If you’re in the same headspace — professional day job, curious evenings, a handful of compiler books on your desk — I can’t recommend the project highly enough. Not my project; the project of picking any small language and trying to make it run. Breeze is just the one I happened to pick because I’d spent too many hours squinting at verbose Lua dissectors.
If you want to poke at it, the code is at github.com/rdapaz/breeze. It’s MIT-licensed. It’s full of comments written at 1am. It has no stars and that’s genuinely fine.
You probably know a better CoffeeScript-to-Lua compiler (Moonscript I am looking right at you!) No matter, though… I wanted to write mine anyway, for the learning.