If you do just foo
, the following justfile
will write a single byte 0x0A
to a file named bar:
x := "\n"
foo:
printf '{{x}}' > bar
Let's find out where that 0x0A
byte comes from.
just
is written in Rust, and the just
parser has a function called
cook_string
, which transforms a just
string token containing escape
sequences into a UTF-8 string.
The code is here here.
With some irrelevant details elided, it looks like this:
for c in text.chars() {
match state {
…
State::Backslash => {
match c {
'n' => cooked.push('\n'),
…
}
…
}
…
}
}
So just
asks rustc
to insert the result of evaluating the Rust '\n'
character escape. Let's take a look at how rustc
handles '\n'
.
rustc
's escape code handling is in the lexer, in a function called
scan_escape
, which is
here.
With some details removed:
let res: char = match chars.next().ok_or(EscapeError::LoneSlash)? {
…
'n' => '\n',
…
};
rustc
is written in Rust and compiles itself, so somehow rustc
is
delegating to rustc
to figure out what '\n'
means, which seems odd, to say
the least, and we still haven't seen the naked 0x0A
byte we're looking for.
rustc
wasn't always written in Rust though. Before it was self-hosted, early
versions were written in OCaml.
GitHub has old versions of the OCaml version of rustc
, which handled
character escapes in the lexer
here.
and char_escape = parse
…
| 'n' { end_char (Char.code '\n') lexbuf }
…
So rustc
asks the OCaml compiler to insert the result of evaluating the
OCaml character escape '\n'
. Which is totally reasonable, but still not a
0x0A
in sight.
Going one step deeper, let's look the OCaml lexer here.
And finally, some clarity:
let char_for_backslash = function
'n' -> '\010'
…
When the OCaml compiler sees \n
, it inserts the result of evaluating the
OCaml character escape \010
, which is a decimal character escape, and since
0x0A
is 10, we finally have our byte value.
So when have a \n
character escape in your justfile, the just
binary
contains a 0x0A
byte in some form, which it will then write to your final
string.
That 0x0A
byte was put there by rustc
, which contained it's own 0x0A
byte somewhere in the binary, which was stuffed there by its rustc
progenitor.
rustc
is currently at version 1.81.0, so this has happened at least 81 times
since rustc
1.0 was first released, and probably many more times than that
before 1.0, with rustc
s furtively smuggling 0x0A
bytes from one to the
other, all the way back to when it was written in OCaml, when finally the first
0x0A
byte was stuffed into a rustc
binary by the OCaml compiler, which
evaluated it from a decimal character escape '\010'
.
This post was inspired by another post about exactly the same thing. I couldn't find it when I looked for it, so I wrote this. All credit to the original author for noticing how interesting this rabbit hole is.