Whence '\n'?
computers · programming

If you do just foo, the following justfile will write a single byte 0x0A to a file named bar:

x := "\n"

foo:
  printf '{{x}}' > bar

Let's find out where that 0x0A byte comes from.

just is written in Rust, and the just parser has a function called cook_string, which transforms a just string token containing escape sequences into a UTF-8 string.

The code is here here.

With some irrelevant details elided, it looks like this:

for c in text.chars() {
  match state {
    …
    State::Backslash => {
      match c {
        'n' => cooked.push('\n'),
        …
      }
      …
    }
    …
  }
}

So just asks rustc to insert the result of evaluating the Rust '\n' character escape. Let's take a look at how rustc handles '\n'.

rustc's escape code handling is in the lexer, in a function called scan_escape, which is here.

With some details removed:

let res: char = match chars.next().ok_or(EscapeError::LoneSlash)? {
    …
    'n' => '\n',
    …
};

rustc is written in Rust and compiles itself, so somehow rustc is delegating to rustc to figure out what '\n' means, which seems odd, to say the least, and we still haven't seen the naked 0x0A byte we're looking for.

rustc wasn't always written in Rust though. Before it was self-hosted, early versions were written in OCaml.

GitHub has old versions of the OCaml version of rustc, which handled character escapes in the lexer here.

and char_escape = parse
  …
  | 'n' { end_char (Char.code '\n') lexbuf }
  …

So rustc asks the OCaml compiler to insert the result of evaluating the OCaml character escape '\n'. Which is totally reasonable, but still not a 0x0A in sight.

Going one step deeper, let's look the OCaml lexer here.

And finally, some clarity:

let char_for_backslash = function
    'n' -> '\010'
  …

When the OCaml compiler sees \n, it inserts the result of evaluating the OCaml character escape \010, which is a decimal character escape, and since 0x0A is 10, we finally have our byte value.

So when have a \n character escape in your justfile, the just binary contains a 0x0A byte in some form, which it will then write to your final string.

That 0x0A byte was put there by rustc, which contained it's own 0x0A byte somewhere in the binary, which was stuffed there by its rustc progenitor.

rustc is currently at version 1.81.0, so this has happened at least 81 times since rustc 1.0 was first released, and probably many more times than that before 1.0, with rustcs furtively smuggling 0x0A bytes from one to the other, all the way back to when it was written in OCaml, when finally the first 0x0A byte was stuffed into a rustc binary by the OCaml compiler, which evaluated it from a decimal character escape '\010'.

This post was inspired by another post about exactly the same thing. I couldn't find it when I looked for it, so I wrote this. All credit to the original author for noticing how interesting this rabbit hole is.