Just: How I Organize Large Rust Programs

Just: How I Organize Large Rust Programs
5.24.2020

programming

One of the things that I personally struggled with when learning Rust was how to organize large programs with multiple modules.

In this post, I'll explain how I organize the codebase of just, a command runner that I wrote.

just was the first large program I wrote in Rust, and its organization has gone through many iterations, as I discovered what worked for me and what didn't.

There are some things that could use improvement, and many of the choices I made are somewhat strange, so definitely don't consider the whole project a normative example of how to write rust.

Overview

Users mostly interact with just by running the main binary from the command line.

However, the crate actually consists of an executable target, in src/main.rs, and a library in src/lib.rs. The main function in main.rs is a thin wrapper that calls the run function in src/run.rs.

The reason that just is split into a executable target and a library target is because there is a fuzz tester in fuzz, and a regression testing framework at janus, and both of these use testing functions exposed by the library.

Submodule Organization

I prefer to keep my module tree flat, so you'll notice that all my source files are directly under src. I find that this makes it easy to remember what's where, since I don't need to remember what subdirectory each source file is in.

I use a fuzzy file searcher, fzf, to switch between files, so having a ton of files in my src directory doesn't bother me. If I used a tree-based file viewer in my editor, I might prefer to group modules into directories by topic.

Common Use Statements

I prefer to group all my use statements together in a single file called src/common.rs.

Then, at the top of every other file, I include them all with use crate::common::*;.

I think this is somewhat uncommon, and most people prefer to put use statements at the top of every file, with just those things used in that particular file.

Both approaches are totally reasonable. I find that grouping use statements into a single file saves a lot of duplication at the top of every file, and makes it easy to start a new file, since you can just write use crate::common::*; and have everything you need in scope.

This does require that I pick unique names for everything that I want to put in common.rs, but I haven't found that to be particularly burdensome.

Submodule Names and Contents

Most modules contain a single definition, either a function, trait, struct, or enum, that is used in the rest of the codebase. These modules are all named after the public definition, and that definition is exported in common.rs, so use crate::common::*; will bring that definition into scope, without needing to qualify it with the module name.

As an example, just's lexer is called Lexer, and is in src/lexer.rs.

In common.rs, it is exported with pub(crate) use lexer::Lexer;.

Since modules are always named after their sole export, it's pretty easy to figure out where something comes just from the name. Since Lexer is a type from just, and not a dependency, that means that it's probably defined in lexer.rs.

A few modules, like src/keyword.rs, contain more than one thing. For modules like that, common.rs just exports the module itself, with pub(crate) use crate::keyword;, and the module name is used when referring to the definitions inside keyword, like keyword::EXPORT.

If a name is from a dependency, then you can see where it comes from in common.rs.

Error Handling

For testing purposes, I like to use error enums, instead of Box<dyn Error, or equivalent. I find that this makes it easy to write tests that look for specific errors. Also, just has detailed error messages, and this lets me separate error message formatting from error value generation.

There are two main kinds of errors, CompilationError, in src/compilation_error.rs, and RuntimeError, in runtime_error.rs.

As you can guess, CompilationError is for problems related to compilation, e.g. lexing, parsing, and analyzing a justfile, and RuntimeError is for problems that occur when running a justfile, e.g. I/O errors and command execution errors.

I currently don't use any of the many error handling helper crates, but in other projects I use snafu, and if I were to rewrite just, I would definitely use snafu.

Clippy

I use clippy, the animate paperclip/rust linter, to automatically check the codebase for issues.

Clippy has many lints that are either pedantic, or which restrict things that are totally reasonable in many contexts. I like a lot of these, but I didn't want to go through all of them and decide which lints to enable, so I turned on all the lints, even the annoying ones, and then just disable lints that I don't like as I encounter them.

You can see this at the top of src/lib.rs.

How does it work?

just does a lot of stuff! This makes it hard to give a concise overview of how everything works, but I'll do my best!

Run

The run function is pretty short, so definitely check it out. It's in src/run.rs. It does some setup, like initializing Windows terminal color support and logging, then parses the command line arguments.

Configuration

Just calls parsed command line arguments a Config. The command line arguments are parsed with the venerable clap, and then stored in a Config struct, which is passed around the rest of the program.

Everything related to setting up the clap parser, and parsing the command line arguments is in src/config.rs.

Subcommand Running

just has a few distinct modes it can run in, e.g. actually running a recipe in a justfile, listing the recipes in a justfile, or evaluating the variables in a justfile. These are called subcommands, and you can see the different subcommands in the Subcommand enum in src/subcommand.rs.

Once a config is parsed, the function run_subcommand in config.rs handles executing the correct subcommand.

For the rest of this post, I'll cover Subcommand::Run, which is the subcommand which is responsible for actually running a justfile.

Compilation

The justfile source is read and the compiler is invoked in the run_subcommand function in config.rs. The Compiler is defined in src/compiler.rs, and has a single short method that calls the lexer, the parser, and the analyzer.

There's no particular reason for having a Compiler struct, since it doesn't have any fields, so it's really just for organization. I would be totally fine with having a module src/compile.rs, and just exporting a single compile function from that module.

Lexing

The first step of compilation is to split the source text into tokens, which is done by the Lexer in src/lexer.rs. The lexer looks a lot like a recursive descent parser. It has a bunch of different methods, and those methods call each other to produce the different tokens.

The entry-point to the lexer is Lexer::lex.

Lexer is relatively well-commented, so please take a look if you're interested!

Tokens

The lexer produces a Vec of Tokens. The Token type is in src/token.rs. Each Token contains a TokenKind, defined in src/token_kind.rs. A Token contains a reference to the soruce program, as well as information about the offset, length, line, and column of the token. A TokenKind tells you what kind of token it actually is.

Parsing

The Tokens produced by the Lexer are passed to a Parser, defined in src/parser.rs, and the main entry point is Parser::parse.

The parser is a recursive descent parser that walks over the tokens, figuring out what kind of construct it's parsing as it goes.

Modules

The output of the parser is a Module, defined in src/module.rs. A Module represents a successful parse, but has not been fully validated. Just does a lot of static analysis, like resolving names, inter-recipe dependencies, and inter-variable dependencies, so not every Module is valid.

You can think of a Module as being like an AST, that still hasn't been statically analyzed for correctness. Inside a Module are Items (src/item.rs), which contain the different source constructs, like Alias, Assignment, UnresolvedRecipe, and Set.

Analysis

The next phase of compilation is analysis, performed by the Analyzer, defined in src/analyzer.rs.

The Analyzer makes sure that all references to recipes and variables can be resolved, and that there are no circular dependencies.

Justfile

The output of the Analyzer is a Justfile, defined in src/justfile.rs.

A Justfile represents a parsed and analyzed justfile. It contains all the recipes, variables, and expressions, all resolved and ready to run. It is the totus porcus, as it were.

Running

A justfile is run with Justfile::run, which takes a Config, a Search with information about where the justfile is and where the working directory is, variable overrides passed on the command line, and a list of arguments.

The arguments are parsed into recipes and arguments to those recipes, and finally those recipes are run with Justfile::run_recipe, which actually executes each recipe, and all dependencies.

Testing

just takes files, parses commands out of those files, and then runs them. I am acutely aware of how this might go wrong, and would feel real bad if just somehow got confused and ran a command that nuked someone's hard drive.

Because of this, I go pretty crazy with testing. There are four kinds of tests: unit tests, integration tests, fuzz testing, and ecosystem-wide regression testing.

Unit Testing

Unit tests are spread around the codebase, in submodules named tests. Each tests submodule contains tests for whatever's in the containing module.

I'm not strict about covering everything with unit tests, but I do cover everything with integration tests. If something doesn't seem to be tested in a unit test, it's probably tested in an integration test.

Integration Testing

Integration tests are in the tests subdirectory, roughly organized by topic. The vast majority are in tests/integration.rs, which test a full run of the just binary, supplying standard input, args, and a justfile, and checking that standard output, standard error, and the exit code are correct.

Fuzz Testing

Fuzz testing was contributed to just by @RadicalZephyr, and is located in the fuzz directory. It generates random strings and feeds them to the parser. (NOT THE RUNNER, DEFINITELY NOT THE RUNNER.) If the parser succeeds or returns an error, that's a successful run. If the fuzzer is able to trigger a panic, then it's found a bug that needs to be fixed.

Regression Testing

Since a lot of people have written a lot of justfiles, I want to make sure I don't break them when I update just.

To do this, I wrote a tool called janus. Janus is inspired by Rust's crater.

Janus downloads all the justfiles that it can find on GitHub, and then compares how two versions of just compiles those justfiles. The two versions of just are usually the latest release, and a new version with a big, scary change.

Janus compiles all justfiles with both versions, and then compares the result. Ideally, every valid justfiles parses into the same Justfile with both versions, and every justfile with an error produces the same error.

Wrapping Up

That's everything I can think of! Ultimately, much of how you organize your Rust programs comes down to personal preference, so just start mashing the keyboard, see what works, and iterate on whatever doesn't.

glhf!