One of the things that I personally struggled with when learning Rust was how to organize large programs with multiple modules.
In this post, I'll explain how I organize the codebase of
just
, a command runner that I wrote.
just
was the first large program I wrote in Rust, and its organization has
gone through many iterations, as I discovered what worked for me and what
didn't.
There are some things that could use improvement, and many of the choices I made are somewhat strange, so definitely don't consider the whole project a normative example of how to write rust.
Overview
Users mostly interact with just
by running the main binary from the command
line.
However, the crate actually consists of an executable target, in
src/main.rs, and a
library in src/lib.rs.
The main
function in main.rs
is a thin wrapper that calls the run
function in
src/run.rs.
The reason that just
is split into a executable target and a library target
is because there is a fuzz tester in
fuzz, and a regression
testing framework at janus, and both of these
use testing functions exposed by the library.
Submodule Organization
I prefer to keep my module tree flat, so you'll notice that all my source files are directly under src. I find that this makes it easy to remember what's where, since I don't need to remember what subdirectory each source file is in.
I use a fuzzy file searcher, fzf, to switch
between files, so having a ton of files in my src
directory doesn't bother
me. If I used a tree-based file viewer in my editor, I might prefer to group
modules into directories by topic.
Common Use Statements
I prefer to group all my use
statements together in a single file called
src/common.rs.
Then, at the top of every other file, I include them all with
use crate::common::*;
.
I think this is somewhat uncommon, and most people prefer to put use
statements at the top of every file, with just those things used in that
particular file.
Both approaches are totally reasonable. I find that grouping use statements
into a single file saves a lot of duplication at the top of every file, and
makes it easy to start a new file, since you can just write use crate::common::*;
and have everything you need in scope.
This does require that I pick unique names for everything that I want to put in
common.rs
, but I haven't found that to be particularly burdensome.
Submodule Names and Contents
Most modules contain a single definition, either a function, trait, struct, or
enum, that is used in the rest of the codebase. These modules are all named
after the public definition, and that definition is exported in common.rs
,
so use crate::common::*;
will bring that definition into scope, without needing to
qualify it with the module name.
As an example, just
's lexer is called Lexer
, and is in
src/lexer.rs.
In common.rs
, it is exported with pub(crate) use lexer::Lexer;
.
Since modules are always named after their sole export, it's pretty easy to
figure out where something comes just from the name. Since Lexer
is a type
from just
, and not a dependency, that means that it's probably defined in
lexer.rs
.
A few modules, like
src/keyword.rs,
contain more than one thing. For modules like that, common.rs
just exports
the module itself, with pub(crate) use crate::keyword;
, and the module name
is used when referring to the definitions inside keyword
, like
keyword::EXPORT
.
If a name is from a dependency, then you can see where it comes from in
common.rs
.
Error Handling
For testing purposes, I like to use error enums, instead of Box<dyn Error
, or
equivalent. I find that this makes it easy to write tests that look for
specific errors. Also, just
has detailed error messages, and this lets me
separate error message formatting from error value generation.
There are two main kinds of errors, CompilationError
, in
src/compilation_error.rs,
and RuntimeError
, in
runtime_error.rs.
As you can guess, CompilationError
is for problems related to compilation,
e.g. lexing, parsing, and analyzing a justfile, and RuntimeError
is for
problems that occur when running a justfile, e.g. I/O errors and command
execution errors.
I currently don't use any of the many error handling helper crates, but in
other projects I use snafu, and if I
were to rewrite just
, I would definitely use snafu
.
Clippy
I use clippy, the animate paperclip/rust linter, to automatically check the codebase for issues.
Clippy has many lints that are either pedantic, or which restrict things that are totally reasonable in many contexts. I like a lot of these, but I didn't want to go through all of them and decide which lints to enable, so I turned on all the lints, even the annoying ones, and then just disable lints that I don't like as I encounter them.
You can see this at the top of src/lib.rs.
How does it work?
just
does a lot of stuff! This makes it hard to give a concise overview of
how everything works, but I'll do my best!
Run
The run
function is pretty short, so definitely check it out. It's in
src/run.rs. It does
some setup, like initializing Windows terminal color support and logging, then
parses the command line arguments.
Configuration
Just calls parsed command line arguments a Config
. The command line arguments
are parsed with the venerable clap, and then
stored in a Config
struct, which is passed around the rest of the program.
Everything related to setting up the clap parser, and parsing the command line arguments is in src/config.rs.
Subcommand Running
just
has a few distinct modes it can run in, e.g. actually running a recipe
in a justfile, listing the recipes in a justfile, or evaluating the variables
in a justfile. These are called subcommands, and you can see the different
subcommands in the Subcommand
enum in
src/subcommand.rs.
Once a config is parsed, the function run_subcommand
in config.rs
handles
executing the correct subcommand.
For the rest of this post, I'll cover Subcommand::Run
, which is the
subcommand which is responsible for actually running a justfile.
Compilation
The justfile source is read and the compiler is invoked in the run_subcommand
function in config.rs
. The Compiler
is defined in
src/compiler.rs,
and has a single short method that calls the lexer, the parser, and the
analyzer.
There's no particular reason for having a Compiler
struct, since it doesn't
have any fields, so it's really just for organization. I would be totally fine
with having a module src/compile.rs
, and just exporting a single compile
function from that module.
Lexing
The first step of compilation is to split the source text into tokens, which is
done by the Lexer
in
src/lexer.rs. The
lexer looks a lot like a recursive descent parser. It has a bunch of different
methods, and those methods call each other to produce the different tokens.
The entry-point to the lexer is Lexer::lex
.
Lexer
is relatively well-commented, so please take a look if you're
interested!
Tokens
The lexer produces a Vec
of Token
s. The Token
type is in
src/token.rs. Each
Token
contains a TokenKind
, defined in
src/token_kind.rs.
A Token
contains a reference to the soruce program, as well as information
about the offset, length, line, and column of the token. A TokenKind
tells
you what kind of token it actually is.
Parsing
The Token
s produced by the Lexer
are passed to a Parser
, defined in
src/parser.rs, and
the main entry point is Parser::parse
.
The parser is a recursive descent parser that walks over the tokens, figuring out what kind of construct it's parsing as it goes.
Modules
The output of the parser is a Module
, defined in
src/module.rs. A
Module
represents a successful parse, but has not been fully validated. Just
does a lot of static analysis, like resolving names, inter-recipe dependencies,
and inter-variable dependencies, so not every Module
is valid.
You can think of a Module
as being like an
AST, that still hasn't
been statically analyzed for correctness. Inside a Module
are Items
(src/item.rs), which
contain the different source constructs, like Alias
, Assignment
,
UnresolvedRecipe
, and Set
.
Analysis
The next phase of compilation is analysis, performed by the Analyzer
, defined
in
src/analyzer.rs.
The Analyzer
makes sure that all references to recipes and variables can be
resolved, and that there are no circular dependencies.
Justfile
The output of the Analyzer
is a Justfile
, defined in
src/justfile.rs.
A Justfile
represents a parsed and analyzed justfile. It contains all the
recipes, variables, and expressions, all resolved and ready to run. It is the
totus porcus, as it were.
Running
A justfile is run with Justfile::run
, which takes a Config
, a Search
with
information about where the justfile is and where the working directory is,
variable overrides passed on the command line, and a list of arguments.
The arguments are parsed into recipes and arguments to those recipes, and
finally those recipes are run with Justfile::run_recipe
, which actually
executes each recipe, and all dependencies.
Testing
just
takes files, parses commands out of those files, and then runs them. I
am acutely aware of how this might go wrong, and would feel real bad if
just
somehow got confused and ran a command that nuked someone's hard drive.
Because of this, I go pretty crazy with testing. There are four kinds of tests: unit tests, integration tests, fuzz testing, and ecosystem-wide regression testing.
Unit Testing
Unit tests are spread around the codebase, in submodules named tests
. Each
tests
submodule contains tests for whatever's in the containing module.
I'm not strict about covering everything with unit tests, but I do cover everything with integration tests. If something doesn't seem to be tested in a unit test, it's probably tested in an integration test.
Integration Testing
Integration tests are in the
tests subdirectory, roughly
organized by topic. The vast majority are in
tests/integration.rs,
which test a full run of the just
binary, supplying standard input, args, and
a justfile, and checking that standard output, standard error, and the exit
code are correct.
Fuzz Testing
Fuzz testing was contributed to just
by
@RadicalZephyr, and is located in the
fuzz directory. It generates
random strings and feeds them to the parser. (NOT THE RUNNER, DEFINITELY NOT
THE RUNNER.) If the parser succeeds or returns an error, that's a successful
run. If the fuzzer is able to trigger a panic, then it's found a bug that needs
to be fixed.
Regression Testing
Since a lot of people have written
a lot of justfiles,
I want to make sure I don't break them when I update just
.
To do this, I wrote a tool called janus. Janus is inspired by Rust's crater.
Janus downloads all the justfiles that it can find on GitHub, and then compares
how two versions of just
compiles those justfiles. The two versions of just
are usually the latest release, and a new version with a big, scary change.
Janus compiles all justfiles with both versions, and then compares the result.
Ideally, every valid justfiles parses into the same Justfile
with both
versions, and every justfile with an error produces the same error.
Wrapping Up
That's everything I can think of! Ultimately, much of how you organize your Rust programs comes down to personal preference, so just start mashing the keyboard, see what works, and iterate on whatever doesn't.
glhf!