Introduction

Bolero is a fuzzing and property testing front-end framework for Rust.

From Wikipedia, fuzzing is described as:

Fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program. The program is then monitored for exceptions such as crashes, failing built-in code assertions, or potential memory leaks.

bolero's goal is to make implementing high-quality tests as painless and approachable as possible.

CLI Installation

bolero provides a CLI program to execute tests, cargo-bolero. It can be installed globally with cargo:

$ cargo install cargo-bolero -f

Linux Installation

cargo-bolero needs a couple of libraries installed to compile. If these libraries aren't available the requirement can be relaxed by executing cargo install cargo-bolero --no-default-features -f

Debian/Ubuntu

$ sudo apt install binutils-dev libunwind-dev

Nix

$ nix-shell -p libbfd libunwind libopcodes

Library Installation

bolero is on crates.io and can be added to a project's dev dependencies like so:

$ cargo add --dev bolero

Then, create the fuzz profile: (Note that LTO is not well-supported for the fuzzing profile)

[profile.fuzz]
inherits = "dev"
opt-level = 3
incremental = false
codegen-units = 1

If you forget adding the profile, then you will get the following error:

error: profile `fuzz` is not defined

Structured Test Generation

If your crate wishes to implement structured test generation on public data structures, bolero-generator can be added to the main dependencies:

$ cargo add bolero-generator

The derive attribute can now be used:


#![allow(unused_variables)]
fn main() {
#[derive(Debug, bolero_generator::TypeGenerator)]
pub struct Coord3d {
    x: u64,
    y: u64,
    z: u64,
}
}

Features

Bolero has several features that make testing easy:

Corpus Replay

After executing a test target, a corpus is generated. A corpus is a set of inputs that trigger unique codepaths. This corpus can be now executed using the standard cargo test command. The corpus should either be commited to the project repository or be stored/restored from storage, like S3.

$ cargo test

     Running target/debug/deps/my_test_target-9b2c2acee51634e0

running 1007 tests
...............................................................................
...............................................................................
...............................................................................
...............................................................................
...............................................................................
...............................................................................
...............................................................................
...............................................................................
...............................................................................
...............................................................................
...............................................................................
...............................................................................
...........................................................
test result: ok. 1007 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Input Shrinking

bolero supports input shrinking for all of the provided testing engines.

What is it?

From PropEr Testing:

Shrinking is the mechanism by which a property-based testing framework can be told how to simplify failure cases enough to let it figure out exactly what the minimal reproducible case is.

Sometimes the input required to find a failure can be fairly large or complex. Finding the initial failing case may have required hundreds of attempts, and it may contain vast amounts of irrelevant information. The framework will then attempt to reduce that data set through shrinking. It generally does so by transforming all the generators used and trying to bring them back towards their own zero point.

Example

Let's suppose we're testing a MySet data structure:

use bolero::{check, generator::*};
use my_set::MySet;

#[derive(Debug, TypeGenerator)]
enum Operation {
    Insert(u64),
    Remove(u64),
    Clear,
}

fn main() {
    check!()
        .with_type::<Vec<Operation>>()
        .for_each(|operations| {
            let mut set = MySet::new();

            for operation in operations.iter() {
                match operation {
                    Operation::Insert(value) => {
                        set.insert(value);
                    }
                    Operation::Remove(value) => {
                        set.remove(value);
                    }
                    Operation::Clear => {
                        set.clear();
                    }
                }
            }
        })
}

Assume there's hypothetical scenario in which adding 16 elements to MySet causes a panic. Without shrinking, the randomly-generated inputs can be difficult to interpret:

======================== Test Failure ========================

Input:
[
    Insert(9693583160302274182),
    Insert(15890536247564076678),
    Clear,
    Insert(15914819332679195868),
    Insert(9717884937115065564),
    Insert(15914645609842007260),
    Insert(18446738425238052060),
    Remove(15912577393975689024),
    Insert(15338377272073444572),
    Insert(15914838024242123988),
    Insert(11228695243045262548),
    Insert(11212726789901900955),
    Insert(11212726789901884315),
    Insert(11212726789901884316),
    Insert(11212726789901884317),
    Insert(11212726789901884318),
    Insert(11212726789901884319),
    Insert(11212726789901884311),
    Insert(11212726789901884312),
    Insert(11212726789901884313),
    Insert(11212726789901884314),
    Insert(9727500806739001343),
    Insert(9693583160302274182),
    Insert(9693583160302274182),
    Insert(5714873654208093419),
    Remove(5714873654208057167),
    Remove(16717362667219255119),
    Insert(9726366166670698728),
    Insert(9727642152175306374),
    Remove(18446181099529437184),
    Insert(18446744073709551615),
    Insert(15336116641675083775),
    Remove(11212726789901884372),
    Insert(11212726789901884315),
    Insert(11212666079612935067)
]

Error:
panicked at 'internal assertion', src/lib.rs:16:17

After shrinking the input, it becomes more obvious how to trigger the bug:

======================== Test Failure ========================

Input:
[
    Insert(0),
    Insert(1),
    Insert(2),
    Insert(3),
    Insert(4),
    Insert(5),
    Insert(6),
    Insert(7),
    Insert(8),
    Insert(9),
    Insert(10),
    Insert(11),
    Insert(12),
    Insert(13),
    Insert(14),
    Insert(15),
]

Error:
panicked at 'internal assertion', src/lib.rs:16:17

Works on Rust Stable

bolero does not require nightly to execute test targets:

# does not require nightly
$ cargo bolero test my_test_target --sanitizer NONE

Sanitizer support

Using a sanitizer will improve the number of edge cases caught by the test. As such, the preference should be towards using them. Unfortunately, sanitizers require Rust nightly to compile.

cargo-bolero will use cargo +nightly instead to execute the test target:

# uses nightly, even if we're using stable by default
$ cargo bolero test --sanitizer address my_test_target

If a specific version of nightly is required, the --toolchain argument can be used:

$ cargo bolero test --sanitizer address --toolchain nightly-2020-01-01 my_test_target

Structured Testing

In addition to generating random byte slices, bolero supports generating well-formed types, with the bolero-generator crate.

Operation Example

Let's supposes we've implemented a MySet data structure. It has 3 operations:

  • insert(value) - inserts an value into the set
  • remove(value) - removes an value from the set
  • clear() - removes all values from the set

The operations can easily be modeled as an enum:


#![allow(unused_variables)]
fn main() {
use bolero::generator::TypeGenerator;

#[derive(Debug, TypeGenerator)]
enum Operation {
    Insert(u64),
    Remove(u64),
    Clear,
}
}

Note that we've added TypeGenerator to the list of derives. This enables bolero to generate random values for Operation. We can combine that with a Vec<Operation> and get a list of operations to perform on our MySet data structure.

use bolero::{check, generator::*};
use my_set::MySet;

#[derive(Debug, TypeGenerator)]
enum Operation {
    Insert(u64),
    Remove(u64),
    Clear,
}

fn main() {
    check!()
        .with_type::<Vec<Operation>>()
        .for_each(|operations| {
            let mut set = MySet::new();

            for operation in operations.iter() {
                match operation {
                    Operation::Insert(value) => {
                        set.insert(value);
                    }
                    Operation::Remove(value) => {
                        set.remove(value);
                    }
                    Operation::Clear => {
                        set.clear();
                    }
                }
            }
        })
}

This basic test will make sure we don't panic on any of the list of operations. We can take it to the next step by using a test oracle to make sure the behavior of MySet is actually correct. Here we'll use HashSet from the std library:

use bolero::{check, generator::*};
use my_set::MySet;
use std::collections::HashSet;

#[derive(Debug, TypeGenerator)]
enum Operation {
    Insert(u64),
    Remove(u64),
    Clear,
}

fn main() {
    check!()
        .with_type::<Vec<Operation>>()
        .for_each(|operations| {
            let mut set = MySet::new();
            let mut oracle = HashSet::new();

            for operation in operations.iter() {
                match operation {
                    Operation::Insert(value) => {
                        set.insert(value);
                        oracle.insert(value);
                    }
                    Operation::Remove(value) => {
                        set.remove(value);
                        oracle.remove(value);
                    }
                    Operation::Clear => {
                        set.clear();
                        oracle.clear();
                    }
                }
            }

            assert!(set.iter().eq(oracle.iter()));
        })
}

Unified Interface

Using the interface provided by bolero, a single test target can execute under several different engines.

LibFuzzer

LibFuzzer is an in-process, coverage-guided, evolutionary fuzzing engine.

LibFuzzer is linked with the library under test, and feeds fuzzed inputs to the library via a specific fuzzing entrypoint (aka “target function”); the fuzzer then tracks which areas of the code are reached, and generates mutations on the corpus of input data in order to maximize the code coverage.

The libfuzzer engine can be selected like so:

$ cargo bolero test --engine libfuzzer my_test_target

Currently, it is also the default engine:

# will use --engine libfuzzer
$ cargo bolero test my_test_target

AFL

American fuzzy lop is a security-oriented fuzzer that employs a novel type of compile-time instrumentation and genetic algorithms to automatically discover clean, interesting test cases that trigger new internal states in the targeted binary. This substantially improves the functional coverage for the fuzzed code. The compact synthesized corpora produced by the tool are also useful for seeding other, more labor- or resource-intensive testing regimes down the road.

The afl engine can be selected like so:

$ cargo bolero test --engine afl my_test_target

Honggfuzz

Honggfuzz is a security oriented fuzzer with powerful analysis options. Supports evolutionary, feedback-driven fuzzing based on code coverage (software- and hardware-based)

The honggfuzz engine can be selected like so:

$ cargo bolero test --engine honggfuzz my_test_target

Kani

Kani is an open-source verification tool that uses automated reasoning to analyze Rust programs. Kani is particularly useful for verifying unsafe code in Rust, where many of the Rust’s usual guarantees are no longer checked by the compiler. Some example properties you can prove with Kani include memory safety properties (e.g., null pointer dereferences, use-after-free, etc.), the absence of certain runtime errors (i.e., index out of bounds, panics), and the absence of some types of unexpected behavior (e.g., arithmetic overflows). Kani can also prove custom properties provided in the form of user-specified assertions.

Kani uses proof harnesses to analyze programs. Proof harnesses are similar to test harnesses, especially property-based test harnesses.

The kani engine can be selected like so:

$ cargo bolero test --engine kani my_test_target

Note that each target needs to include a #[kani::proof] attribute:


#![allow(unused_variables)]
fn main() {
#[test]
#[cfg_attr(kani, kani::proof)]
fn my_test_target() {
    bolero::check!().with_type().for_each(|v: &u8| {
        assert_ne!(*v, 123);
    });
}
}

Private Testing

bolero also supports running tests inside of a project. This is useful for testing private interfaces and implementations.


#![allow(unused_variables)]
fn main() {
#[test]
fn my_property_test() {
    bolero::check!()
        .with_type()
        .cloned()
        .for_each(|value: u64| {
            // implement property checks here
        });
}
}

Miri Support

bolero supports executing tests with Miri. Keep in mind that execution is significantly slower in Miri.

The isolation mode must currently be disabled in order for bolero tests to read corpuses from the file system. This can be done by setting the appropriate flags:

MIRIFLAGS="-Zmiri-disable-isolation" cargo +nightly miri test

Tutorials

Fibonacci

In this tutorial, we want to arrive at a bug-free fibonacci implementation. Let's start with a basic setup:

$ cargo new --lib my_fibonacci

#![allow(unused_variables)]
fn main() {
// src/lib.rs

pub fn fibonacci(number: u64) -> u64 {
    let mut a = 0;
    let mut b = 1;

    for _ in 0..number {
        b += core::mem::replace(&mut a, b);
    }

    b
}
}

Now we define a test:

$ cargo bolero new fibonacci_test --generator
// tests/fibonacci_test/main.rs
use bolero::check;
use my_fibonacci::fibonacci;

fn main() {
    check!()
        .with_type()
        .cloned()
        .for_each(|number: u64| {
            fibonacci(number);
        })
}

Now let's fuzz our fibonacci function:

$ cargo bolero test fibonacci_test
    Finished test [unoptimized + debuginfo] target(s) in 0.10s
     Running target/fuzz/build_62a8ab526939db81/x86_64-apple-darwin/debug/deps/fibonacci_test-f9f8f1dcc806b6b6
...
thread 'main' panicked at 'attempt to add with overflow', my_fibonacci/tests/fibonacci_test/main.rs:8:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

======================== Test Failure ========================

Input:
93

Error:
panicked at 'attempt to add with overflow', my_fibonacci/tests/fibonacci_test/fuzz_target.rs:8:9

==============================================================

Uh oh... It looks like we've got a bug! bolero was able to find that calling our function with 93 results in an integer overflow. It's try fixing that by adding overflow checks with u64::checked_add:


#![allow(unused_variables)]
fn main() {
// src/lib.rs

pub fn fibonacci(number: u64) -> Option<u64> {
    let mut a = 0u64;
    let mut b = 1u64;

    for _ in 0..number {
        b = b.checked_add(core::mem::replace(&mut a, b))?;
    }

    Some(b)
}
}

After running the test command for a few minutes things are looking better:

$ cargo bolero test fibonacci_test
    Finished test [unoptimized + debuginfo] target(s) in 0.10s
     Running target/fuzz/build_62a8ab526939db81/x86_64-apple-darwin/debug/deps/fibonacci_test-f9f8f1dcc806b6b6
...
#272    INITED cov: 469 ft: 872 corp: 17/106b lim: 4 exec/s: 0 rss: 27Mb
    NEW_FUNC[1/1]: 0x102ef95f1
#277    NEW    cov: 476 ft: 880 corp: 18/112b lim: 6 exec/s: 0 rss: 27Mb L: 6/13 MS: 5 ChangeByte-ChangeBit-CopyPart-CopyPart-CrossOver-
#293    REDUCE cov: 476 ft: 880 corp: 18/109b lim: 6 exec/s: 0 rss: 27Mb L: 3/13 MS: 1 EraseBytes-
#341    NEW    cov: 476 ft: 928 corp: 19/119b lim: 6 exec/s: 0 rss: 27Mb L: 10/13 MS: 3 CMP-CopyPart-ChangeBinInt- DE: " \x00\x00\x00\x00\x00\x00\x00"-
#369    REDUCE cov: 476 ft: 928 corp: 19/118b lim: 6 exec/s: 0 rss: 27Mb L: 2/13 MS: 3 ShuffleBytes-ShuffleBytes-EraseBytes-
#397    REDUCE cov: 476 ft: 928 corp: 19/117b lim: 6 exec/s: 0 rss: 27Mb L: 1/13 MS: 3 ShuffleBytes-ChangeByte-EraseBytes-
#409    NEW    cov: 476 ft: 984 corp: 20/130b lim: 6 exec/s: 0 rss: 27Mb L: 13/13 MS: 2 ChangeByte-ChangeBinInt-
#501    NEW    cov: 476 ft: 1033 corp: 21/143b lim: 6 exec/s: 0 rss: 27Mb L: 13/13 MS: 2 ChangeBit-ChangeBinInt-
#977    REDUCE cov: 476 ft: 1033 corp: 21/139b lim: 8 exec/s: 0 rss: 27Mb L: 9/13 MS: 1 EraseBytes-
#1289   REDUCE cov: 476 ft: 1033 corp: 21/136b lim: 11 exec/s: 0 rss: 27Mb L: 10/13 MS: 2 ChangeASCIIInt-EraseBytes-
#1670   REDUCE cov: 476 ft: 1033 corp: 21/132b lim: 14 exec/s: 0 rss: 27Mb L: 9/10 MS: 1 EraseBytes-
#1741   REDUCE cov: 476 ft: 1033 corp: 21/131b lim: 14 exec/s: 0 rss: 27Mb L: 9/10 MS: 1 EraseBytes-
#10199  REDUCE cov: 476 ft: 1033 corp: 21/127b lim: 92 exec/s: 5099 rss: 27Mb L: 5/10 MS: 2 ChangeByte-EraseBytes-
#10455  REDUCE cov: 476 ft: 1033 corp: 21/125b lim: 92 exec/s: 5227 rss: 27Mb L: 3/10 MS: 1 EraseBytes-
#11753  REDUCE cov: 476 ft: 1033 corp: 21/121b lim: 104 exec/s: 5876 rss: 27Mb L: 5/10 MS: 3 ChangeBinInt-InsertByte-EraseBytes-

Are we done? Not quite... This is a good time to point out that basic fuzz testing can only get you so far. If we look on Wikipedia we find the following table:

F0F1F2F3F4F5F6F7F8F9F10
011235813213455

Do we actually know if the return value is correct? All we've really made sure of is that the implementation doesn't panic. It could be returning 42 for every answer and our fuzz tests wouldn't have caught it. How do we fix this?

Test Oracle

Using test oracles in conjection with our test can be an effective way to assert our implementation is correct. What is a test oracle? From Write Fuzzable Code:

A test oracle decides whether a test case triggered a bug or not. By default, the only oracle available to a fuzzer like afl is provided by the OS’s page protection mechanism. In other words, it detects only crashes. We can do much better than this.

Assertions and their compiler-inserted friends — sanitizer checks — are another excellent kind of oracle. You should fuzz using as many of these checks as possible. Beyond these easy oracles, many more possibilities exist, such as:

  • function-inverse pairs: does a parse-print loop, compress-decompress loop, encrypt-decrypt loop, or similar, work as expected?
  • differential: do two different implementations, or modes of the same implementation, show the same behavior?
  • metamorphic: does the system show the same behavior when a test case is modified in a semantics-preserving way, such as adding a layer of parentheses to an expression?
  • resource: does the system consume a reasonable amount of time, memory, etc. when processing an input?
  • domain specific: for example, is a lossily-compressed image sufficiently visually similar to its uncompressed version?

We've already seen a good example of a test oracle in action. Rust includes debug assertions for unchecked integer overflows. We were able to use these assertions in finding the limits of our implementation.

Unit tests could also be considered as test oracles and can be effective at asserting expected behavior of well known inputs and outputs.

Unit tests

The easiest solution is to copy the table values from wikipedia and test our function with a unit test:


#![allow(unused_variables)]
fn main() {
// src/lib.rs

#[test]
fn fibonacci_test() {
    assert_eq!(fibonacci(0), Some(0));
    assert_eq!(fibonacci(1), Some(1));
    assert_eq!(fibonacci(2), Some(1));
    assert_eq!(fibonacci(3), Some(2));
    assert_eq!(fibonacci(4), Some(3));
    assert_eq!(fibonacci(5), Some(5));
    assert_eq!(fibonacci(6), Some(8));
    assert_eq!(fibonacci(7), Some(13));
    assert_eq!(fibonacci(8), Some(21));
    assert_eq!(fibonacci(9), Some(34));
    assert_eq!(fibonacci(10), Some(55));
}
}

Let's try running our unit test:

$ cargo test
    Finished test [unoptimized + debuginfo] target(s) in 52.06s
     Running target/debug/deps/my_fibonacci-e9bfbebb80b3a5bf

running 1 test
test fibonacci_test ... FAILED

failures:

---- fibonacci_test stdout ----
thread 'fibonacci_test' panicked at 'assertion failed: `(left == right)`
  left: `Some(1)`,
 right: `Some(0)`', my_fibonacci/src/lib.rs:29:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    fibonacci_test

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out

error: test failed, to rerun pass '--lib'

We haven't handled our zero case! Let's fix that:


#![allow(unused_variables)]
fn main() {
// src/lib.rs

pub fn fibonacci(number: u64) -> Option<u64> {
    if number == 0 {
        return Some(0);
    }

    let mut a = 0u64;
    let mut b = 1u64;

    for _ in 0..number {
        b = b.checked_add(core::mem::replace(&mut a, b))?;
    }

    Some(b)
}
}

Let's run the test again:

$ cargo test
    Finished test [unoptimized + debuginfo] target(s) in 52.06s
     Running target/debug/deps/my_fibonacci-e9bfbebb80b3a5bf

running 1 test
test fibonacci_test ... FAILED

failures:

---- fibonacci_test stdout ----
thread 'fibonacci_test' panicked at 'assertion failed: `(left == right)`
  left: `Some(2)`,
 right: `Some(1)`', my_fibonacci/src/lib.rs:35:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    fibonacci_test

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out

error: test failed, to rerun pass '--lib'

Another bug!? In this case we're actually looping 1 too many times. Here's the fix:


#![allow(unused_variables)]
fn main() {
// src/lib.rs

pub fn fibonacci(number: u64) -> Option<u64> {
    if number == 0 {
        return Some(0);
    }

    let mut a = 0u64;
    let mut b = 1u64;

    for _ in 1..number {
        b = b.checked_add(core::mem::replace(&mut a, b))?;
    }

    Some(b)
}
}

After that final fix all of our tests pass:

$ cargo test
    Finished test [unoptimized + debuginfo] target(s) in 52.06s
     Running target/debug/deps/my_fibonacci-e9bfbebb80b3a5bf

running 1 test
test fibonacci_test ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/fibonacci_test-e98d85aab754d963

running 1022 tests
...............................................................................
...............................................................................
...............................................................................
...............................................................................
...............................................................................
...............................................................................
...............................................................................
...............................................................................
...............................................................................
...............................................................................
...............................................................................
...............................................................................
..........................................................................
test result: ok. 1022 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Differential Oracle

We could also try to use the less-efficient, recursive method to check our implementation. It's easy to understand and implement:


#![allow(unused_variables)]
fn main() {
fn fibonacci_reccursive(n: u64) -> Option<u64> {
    match n {
        0 => Some(0),
        1 => Some(1),
        _ => fibonacci_reccursive(n - 1)?.checked_add(fibonacci_reccursive(n - 2)?),
    }
}
}

The problem with that approach is it ends up being way too slow for larger numbers, even in --release mode.

Another option is to use a 3rd party implementation. Doing a quick search on crates.io results in a crate that implements the fibonacci sequence. There's also a problem with that: the crate actually has the same bug as our implementation. It skips the first two values in the sequence 0 and 1.

Conclusion

The takeaway is some thought needs to go into how to test your implementation effectively. Often times, combining multiple approaches will provide the best result.

Developer documentation

bolero is an open source project that welcomes external contributions. One of the easiest ways to contribute is to report any issue you encounter while using the tool.

If you want to contribute to its development, this chapter provides documentation that might be helpful for developers.

Build from source

In general, the following dependencies are required to build bolero from source.

Dependencies

  • The Rust toolchain (cargo, rustfmt, etc.) installed via rustup
  • make

bolero has been tested in Ubuntu 22.04 and macOS 12 platforms.

Ubuntu 22.04

sudo apt update
sudo apt install binutils-dev libunwind-dev

make comes pre-installed on Ubuntu, but if for some reason it isn't, it can be installed using the command:

sudo apt install make

macOS 12

make can be installed using the command:

xcode-select --install

No other dependencies are required.

Build and test

The Makefile located in the root directory can be used to build bolero and run it on several test suites. To execute it, just run:

make

This should compile bolero and run multiple tests. In the process, it's possible that you are shown the following message:

[-] Hmm, your system is configured to send core dump notifications to an
    external utility. This will cause issues: there will be an extended delay
    between stumbling upon a crash and having this information relayed to the
    fuzzer via the standard waitpid() API.

    To avoid having crashes misinterpreted as timeouts, please log in as root
    and temporarily modify /proc/sys/kernel/core_pattern, like so:

    echo core >/proc/sys/kernel/core_pattern

This message comes from AFL. You can either modify the file as indicated or re-run the make command as follows:

AFL_I_DONT_CARE_ABOUT_MISSING_CRASHES=1 make

However, this doesn't guarantee that the AFL tests will pass. In that case, the best option is to temporarily modify the /proc/sys/kernel/core_pattern file.