Introduction
Bolero is a fuzzing and property testing front-end framework for Rust.
From Wikipedia, fuzzing is described as:
Fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program. The program is then monitored for exceptions such as crashes, failing built-in code assertions, or potential memory leaks.
bolero
's goal is to make implementing high-quality tests as painless and approachable as possible.
CLI Installation
bolero
provides a CLI program to execute tests, cargo-bolero
. It can be installed globally with cargo:
$ cargo install cargo-bolero -f
Linux Installation
cargo-bolero
needs a couple of libraries installed to compile. If these libraries aren't
available the requirement can be relaxed by executing cargo install cargo-bolero --no-default-features -f
Debian/Ubuntu
$ sudo apt install binutils-dev libunwind-dev
Nix
$ nix-shell -p libbfd libunwind libopcodes
Library Installation
bolero
is on crates.io
and can be added to a project's dev dependencies like so:
$ cargo add --dev bolero
Then, create the fuzz
profile: (Note that LTO is not well-supported for the fuzzing profile)
[profile.fuzz]
inherits = "dev"
opt-level = 3
incremental = false
codegen-units = 1
If you forget adding the profile, then you will get the following error:
error: profile `fuzz` is not defined
Structured Test Generation
If your crate wishes to implement structured test generation on public data structures, bolero-generator
can be added to the main dependencies:
$ cargo add bolero-generator
The derive attribute can now be used:
#![allow(unused_variables)] fn main() { #[derive(Debug, bolero_generator::TypeGenerator)] pub struct Coord3d { x: u64, y: u64, z: u64, } }
Features
Bolero has several features that make testing easy:
- Corpus Replay
- Input Shrinking
- Works on Rust Stable
- Structured Testing
- Unified Interface
- Private Testing
- Miri Support
Corpus Replay
After executing a test target, a corpus is generated. A corpus is a set of inputs that trigger unique codepaths. This corpus can be now executed using the standard cargo test
command. The corpus should either be commited to the project repository or be stored/restored from storage, like S3.
$ cargo test
Running target/debug/deps/my_test_target-9b2c2acee51634e0
running 1007 tests
...............................................................................
...............................................................................
...............................................................................
...............................................................................
...............................................................................
...............................................................................
...............................................................................
...............................................................................
...............................................................................
...............................................................................
...............................................................................
...............................................................................
...........................................................
test result: ok. 1007 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Input Shrinking
bolero
supports input shrinking for all of the provided testing engines.
What is it?
From PropEr Testing:
Shrinking is the mechanism by which a property-based testing framework can be told how to simplify failure cases enough to let it figure out exactly what the minimal reproducible case is.
Sometimes the input required to find a failure can be fairly large or complex. Finding the initial failing case may have required hundreds of attempts, and it may contain vast amounts of irrelevant information. The framework will then attempt to reduce that data set through shrinking. It generally does so by transforming all the generators used and trying to bring them back towards their own zero point.
Example
Let's suppose we're testing a MySet
data structure:
use bolero::{check, generator::*}; use my_set::MySet; #[derive(Debug, TypeGenerator)] enum Operation { Insert(u64), Remove(u64), Clear, } fn main() { check!() .with_type::<Vec<Operation>>() .for_each(|operations| { let mut set = MySet::new(); for operation in operations.iter() { match operation { Operation::Insert(value) => { set.insert(value); } Operation::Remove(value) => { set.remove(value); } Operation::Clear => { set.clear(); } } } }) }
Assume there's hypothetical scenario in which adding 16 elements to MySet
causes a panic. Without shrinking, the randomly-generated inputs can be difficult to interpret:
======================== Test Failure ========================
Input:
[
Insert(9693583160302274182),
Insert(15890536247564076678),
Clear,
Insert(15914819332679195868),
Insert(9717884937115065564),
Insert(15914645609842007260),
Insert(18446738425238052060),
Remove(15912577393975689024),
Insert(15338377272073444572),
Insert(15914838024242123988),
Insert(11228695243045262548),
Insert(11212726789901900955),
Insert(11212726789901884315),
Insert(11212726789901884316),
Insert(11212726789901884317),
Insert(11212726789901884318),
Insert(11212726789901884319),
Insert(11212726789901884311),
Insert(11212726789901884312),
Insert(11212726789901884313),
Insert(11212726789901884314),
Insert(9727500806739001343),
Insert(9693583160302274182),
Insert(9693583160302274182),
Insert(5714873654208093419),
Remove(5714873654208057167),
Remove(16717362667219255119),
Insert(9726366166670698728),
Insert(9727642152175306374),
Remove(18446181099529437184),
Insert(18446744073709551615),
Insert(15336116641675083775),
Remove(11212726789901884372),
Insert(11212726789901884315),
Insert(11212666079612935067)
]
Error:
panicked at 'internal assertion', src/lib.rs:16:17
After shrinking the input, it becomes more obvious how to trigger the bug:
======================== Test Failure ========================
Input:
[
Insert(0),
Insert(1),
Insert(2),
Insert(3),
Insert(4),
Insert(5),
Insert(6),
Insert(7),
Insert(8),
Insert(9),
Insert(10),
Insert(11),
Insert(12),
Insert(13),
Insert(14),
Insert(15),
]
Error:
panicked at 'internal assertion', src/lib.rs:16:17
Works on Rust Stable
bolero
does not require nightly to execute test targets:
# does not require nightly
$ cargo bolero test my_test_target --sanitizer NONE
Sanitizer support
Using a sanitizer will improve the number of edge cases caught by the test. As such, the preference should be towards using them. Unfortunately, sanitizers require Rust nightly to compile.
cargo-bolero
will use cargo +nightly
instead to execute the test target:
# uses nightly, even if we're using stable by default
$ cargo bolero test --sanitizer address my_test_target
If a specific version of nightly is required, the --toolchain
argument can be used:
$ cargo bolero test --sanitizer address --toolchain nightly-2020-01-01 my_test_target
Structured Testing
In addition to generating random byte slices, bolero
supports generating well-formed types, with the bolero-generator
crate.
Operation Example
Let's supposes we've implemented a MySet
data structure. It has 3 operations:
insert(value)
- inserts an value into the setremove(value)
- removes an value from the setclear()
- removes all values from the set
The operations can easily be modeled as an enum
:
#![allow(unused_variables)] fn main() { use bolero::generator::TypeGenerator; #[derive(Debug, TypeGenerator)] enum Operation { Insert(u64), Remove(u64), Clear, } }
Note that we've added TypeGenerator
to the list of derives. This enables bolero
to generate random values for Operation
. We can combine that with a Vec<Operation>
and get a list of operations to perform on our MySet
data structure.
use bolero::{check, generator::*}; use my_set::MySet; #[derive(Debug, TypeGenerator)] enum Operation { Insert(u64), Remove(u64), Clear, } fn main() { check!() .with_type::<Vec<Operation>>() .for_each(|operations| { let mut set = MySet::new(); for operation in operations.iter() { match operation { Operation::Insert(value) => { set.insert(value); } Operation::Remove(value) => { set.remove(value); } Operation::Clear => { set.clear(); } } } }) }
Controlling The Number Of Operations Generated
Using check!().with_type::<Vec<T>>()
will generate vectors with lengths in the range 0..=64
. If you need more control over the number of elements being generated you can instead use .with_generator
and provide a customized generator:
use bolero::{check, generator::*}; use bolero::gen; use my_set::MySet; #[derive(Debug, TypeGenerator)] enum Operation { Insert(u64), Remove(u64), Clear, } fn main() { check!() // Generate 0 to 200 operations .with_generator(gen::<Vec<Operation>>().with().len(0..=200)) .for_each(|operations| { let mut set = MySet::new(); for operation in operations.iter() { match operation { Operation::Insert(value) => { set.insert(value); } Operation::Remove(value) => { set.remove(value); } Operation::Clear => { set.clear(); } } } // assertions go here }) }
Using Test Oracles
The basic test we constructed above will make sure we don't panic on any of the list of operations. We can take it to the next step by using a test oracle to make sure the behavior of MySet
is actually correct. Here we'll use HashSet
from the std
library:
use bolero::{check, generator::*}; use my_set::MySet; use std::collections::HashSet; #[derive(Debug, TypeGenerator)] enum Operation { Insert(u64), Remove(u64), Clear, } fn main() { check!() .with_type::<Vec<Operation>>() .for_each(|operations| { let mut set = MySet::new(); let mut oracle = HashSet::new(); for operation in operations.iter() { match operation { Operation::Insert(value) => { set.insert(value); oracle.insert(value); } Operation::Remove(value) => { set.remove(value); oracle.remove(value); } Operation::Clear => { set.clear(); oracle.clear(); } } } assert!(set.iter().eq(oracle.iter())); }) }
Unified Interface
Using the interface provided by bolero
, a single test target can execute under several different engines.
LibFuzzer
LibFuzzer is an in-process, coverage-guided, evolutionary fuzzing engine.
LibFuzzer is linked with the library under test, and feeds fuzzed inputs to the library via a specific fuzzing entrypoint (aka “target function”); the fuzzer then tracks which areas of the code are reached, and generates mutations on the corpus of input data in order to maximize the code coverage.
The libfuzzer
engine can be selected like so:
$ cargo bolero test --engine libfuzzer my_test_target
Currently, it is also the default engine:
# will use --engine libfuzzer
$ cargo bolero test my_test_target
AFL
American fuzzy lop is a security-oriented fuzzer that employs a novel type of compile-time instrumentation and genetic algorithms to automatically discover clean, interesting test cases that trigger new internal states in the targeted binary. This substantially improves the functional coverage for the fuzzed code. The compact synthesized corpora produced by the tool are also useful for seeding other, more labor- or resource-intensive testing regimes down the road.
The afl
engine can be selected like so:
$ cargo bolero test --engine afl my_test_target
Honggfuzz
Honggfuzz is a security oriented fuzzer with powerful analysis options. Supports evolutionary, feedback-driven fuzzing based on code coverage (software- and hardware-based)
The honggfuzz
engine can be selected like so:
$ cargo bolero test --engine honggfuzz my_test_target
Kani
Kani is an open-source verification tool that uses automated reasoning to analyze Rust programs. Kani is particularly useful for verifying unsafe code in Rust, where many of the Rust’s usual guarantees are no longer checked by the compiler. Some example properties you can prove with Kani include memory safety properties (e.g., null pointer dereferences, use-after-free, etc.), the absence of certain runtime errors (i.e., index out of bounds, panics), and the absence of some types of unexpected behavior (e.g., arithmetic overflows). Kani can also prove custom properties provided in the form of user-specified assertions.
Kani uses proof harnesses to analyze programs. Proof harnesses are similar to test harnesses, especially property-based test harnesses.
The kani
engine can be selected like so:
$ cargo bolero test --engine kani my_test_target
Note that each target needs to include a #[kani::proof]
attribute:
#![allow(unused_variables)] fn main() { #[test] #[cfg_attr(kani, kani::proof)] fn my_test_target() { bolero::check!().with_type().for_each(|v: &u8| { assert_ne!(*v, 123); }); } }
Private Testing
bolero
also supports running tests inside of a project. This is useful for testing private interfaces and implementations.
#![allow(unused_variables)] fn main() { #[test] fn my_property_test() { bolero::check!() .with_type() .cloned() .for_each(|value: u64| { // implement property checks here }); } }
Miri Support
bolero
supports executing tests with Miri. Keep in mind that execution is significantly slower in Miri.
The isolation mode must currently be disabled in order for bolero tests to read corpuses from the file system. This can be done by setting the appropriate flags:
MIRIFLAGS="-Zmiri-disable-isolation" cargo +nightly miri test
Tutorials
Fibonacci
In this tutorial, we want to arrive at a bug-free fibonacci implementation. Let's start with a basic setup:
$ cargo new --lib my_fibonacci
#![allow(unused_variables)] fn main() { // src/lib.rs pub fn fibonacci(number: u64) -> u64 { let mut a = 0; let mut b = 1; for _ in 0..number { b += core::mem::replace(&mut a, b); } b } }
Now we define a test:
$ cargo bolero new fibonacci_test --generator
// tests/fibonacci_test/main.rs use bolero::check; use my_fibonacci::fibonacci; fn main() { check!() .with_type() .cloned() .for_each(|number: u64| { fibonacci(number); }) }
Now let's fuzz our fibonacci
function:
$ cargo bolero test fibonacci_test
Finished test [unoptimized + debuginfo] target(s) in 0.10s
Running target/fuzz/build_62a8ab526939db81/x86_64-apple-darwin/debug/deps/fibonacci_test-f9f8f1dcc806b6b6
...
thread 'main' panicked at 'attempt to add with overflow', my_fibonacci/tests/fibonacci_test/main.rs:8:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
======================== Test Failure ========================
Input:
93
Error:
panicked at 'attempt to add with overflow', my_fibonacci/tests/fibonacci_test/fuzz_target.rs:8:9
==============================================================
Uh oh... It looks like we've got a bug! bolero
was able to find that calling our function with 93
results in an integer overflow. It's try fixing that by adding overflow checks with u64::checked_add
:
#![allow(unused_variables)] fn main() { // src/lib.rs pub fn fibonacci(number: u64) -> Option<u64> { let mut a = 0u64; let mut b = 1u64; for _ in 0..number { b = b.checked_add(core::mem::replace(&mut a, b))?; } Some(b) } }
After running the test
command for a few minutes things are looking better:
$ cargo bolero test fibonacci_test
Finished test [unoptimized + debuginfo] target(s) in 0.10s
Running target/fuzz/build_62a8ab526939db81/x86_64-apple-darwin/debug/deps/fibonacci_test-f9f8f1dcc806b6b6
...
#272 INITED cov: 469 ft: 872 corp: 17/106b lim: 4 exec/s: 0 rss: 27Mb
NEW_FUNC[1/1]: 0x102ef95f1
#277 NEW cov: 476 ft: 880 corp: 18/112b lim: 6 exec/s: 0 rss: 27Mb L: 6/13 MS: 5 ChangeByte-ChangeBit-CopyPart-CopyPart-CrossOver-
#293 REDUCE cov: 476 ft: 880 corp: 18/109b lim: 6 exec/s: 0 rss: 27Mb L: 3/13 MS: 1 EraseBytes-
#341 NEW cov: 476 ft: 928 corp: 19/119b lim: 6 exec/s: 0 rss: 27Mb L: 10/13 MS: 3 CMP-CopyPart-ChangeBinInt- DE: " \x00\x00\x00\x00\x00\x00\x00"-
#369 REDUCE cov: 476 ft: 928 corp: 19/118b lim: 6 exec/s: 0 rss: 27Mb L: 2/13 MS: 3 ShuffleBytes-ShuffleBytes-EraseBytes-
#397 REDUCE cov: 476 ft: 928 corp: 19/117b lim: 6 exec/s: 0 rss: 27Mb L: 1/13 MS: 3 ShuffleBytes-ChangeByte-EraseBytes-
#409 NEW cov: 476 ft: 984 corp: 20/130b lim: 6 exec/s: 0 rss: 27Mb L: 13/13 MS: 2 ChangeByte-ChangeBinInt-
#501 NEW cov: 476 ft: 1033 corp: 21/143b lim: 6 exec/s: 0 rss: 27Mb L: 13/13 MS: 2 ChangeBit-ChangeBinInt-
#977 REDUCE cov: 476 ft: 1033 corp: 21/139b lim: 8 exec/s: 0 rss: 27Mb L: 9/13 MS: 1 EraseBytes-
#1289 REDUCE cov: 476 ft: 1033 corp: 21/136b lim: 11 exec/s: 0 rss: 27Mb L: 10/13 MS: 2 ChangeASCIIInt-EraseBytes-
#1670 REDUCE cov: 476 ft: 1033 corp: 21/132b lim: 14 exec/s: 0 rss: 27Mb L: 9/10 MS: 1 EraseBytes-
#1741 REDUCE cov: 476 ft: 1033 corp: 21/131b lim: 14 exec/s: 0 rss: 27Mb L: 9/10 MS: 1 EraseBytes-
#10199 REDUCE cov: 476 ft: 1033 corp: 21/127b lim: 92 exec/s: 5099 rss: 27Mb L: 5/10 MS: 2 ChangeByte-EraseBytes-
#10455 REDUCE cov: 476 ft: 1033 corp: 21/125b lim: 92 exec/s: 5227 rss: 27Mb L: 3/10 MS: 1 EraseBytes-
#11753 REDUCE cov: 476 ft: 1033 corp: 21/121b lim: 104 exec/s: 5876 rss: 27Mb L: 5/10 MS: 3 ChangeBinInt-InsertByte-EraseBytes-
Are we done? Not quite... This is a good time to point out that basic fuzz testing can only get you so far. If we look on Wikipedia we find the following table:
F0 | F1 | F2 | F3 | F4 | F5 | F6 | F7 | F8 | F9 | F10 |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | 2 | 3 | 5 | 8 | 13 | 21 | 34 | 55 |
Do we actually know if the return value is correct? All we've really made sure of is that the implementation doesn't panic. It could be returning 42
for every answer and our fuzz tests wouldn't have caught it. How do we fix this?
Test Oracle
Using test oracles in conjection with our test can be an effective way to assert our implementation is correct. What is a test oracle? From Write Fuzzable Code:
A test oracle decides whether a test case triggered a bug or not. By default, the only oracle available to a fuzzer like afl is provided by the OS’s page protection mechanism. In other words, it detects only crashes. We can do much better than this.
Assertions and their compiler-inserted friends — sanitizer checks — are another excellent kind of oracle. You should fuzz using as many of these checks as possible. Beyond these easy oracles, many more possibilities exist, such as:
- function-inverse pairs: does a parse-print loop, compress-decompress loop, encrypt-decrypt loop, or similar, work as expected?
- differential: do two different implementations, or modes of the same implementation, show the same behavior?
- metamorphic: does the system show the same behavior when a test case is modified in a semantics-preserving way, such as adding a layer of parentheses to an expression?
- resource: does the system consume a reasonable amount of time, memory, etc. when processing an input?
- domain specific: for example, is a lossily-compressed image sufficiently visually similar to its uncompressed version?
We've already seen a good example of a test oracle in action. Rust includes debug assertions for unchecked integer overflows. We were able to use these assertions in finding the limits of our implementation.
Unit tests could also be considered as test oracles and can be effective at asserting expected behavior of well known inputs and outputs.
Unit tests
The easiest solution is to copy the table values from wikipedia and test our function with a unit test:
#![allow(unused_variables)] fn main() { // src/lib.rs #[test] fn fibonacci_test() { assert_eq!(fibonacci(0), Some(0)); assert_eq!(fibonacci(1), Some(1)); assert_eq!(fibonacci(2), Some(1)); assert_eq!(fibonacci(3), Some(2)); assert_eq!(fibonacci(4), Some(3)); assert_eq!(fibonacci(5), Some(5)); assert_eq!(fibonacci(6), Some(8)); assert_eq!(fibonacci(7), Some(13)); assert_eq!(fibonacci(8), Some(21)); assert_eq!(fibonacci(9), Some(34)); assert_eq!(fibonacci(10), Some(55)); } }
Let's try running our unit test:
$ cargo test
Finished test [unoptimized + debuginfo] target(s) in 52.06s
Running target/debug/deps/my_fibonacci-e9bfbebb80b3a5bf
running 1 test
test fibonacci_test ... FAILED
failures:
---- fibonacci_test stdout ----
thread 'fibonacci_test' panicked at 'assertion failed: `(left == right)`
left: `Some(1)`,
right: `Some(0)`', my_fibonacci/src/lib.rs:29:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
failures:
fibonacci_test
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out
error: test failed, to rerun pass '--lib'
We haven't handled our zero case! Let's fix that:
#![allow(unused_variables)] fn main() { // src/lib.rs pub fn fibonacci(number: u64) -> Option<u64> { if number == 0 { return Some(0); } let mut a = 0u64; let mut b = 1u64; for _ in 0..number { b = b.checked_add(core::mem::replace(&mut a, b))?; } Some(b) } }
Let's run the test again:
$ cargo test
Finished test [unoptimized + debuginfo] target(s) in 52.06s
Running target/debug/deps/my_fibonacci-e9bfbebb80b3a5bf
running 1 test
test fibonacci_test ... FAILED
failures:
---- fibonacci_test stdout ----
thread 'fibonacci_test' panicked at 'assertion failed: `(left == right)`
left: `Some(2)`,
right: `Some(1)`', my_fibonacci/src/lib.rs:35:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
failures:
fibonacci_test
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out
error: test failed, to rerun pass '--lib'
Another bug!? In this case we're actually looping 1 too many times. Here's the fix:
#![allow(unused_variables)] fn main() { // src/lib.rs pub fn fibonacci(number: u64) -> Option<u64> { if number == 0 { return Some(0); } let mut a = 0u64; let mut b = 1u64; for _ in 1..number { b = b.checked_add(core::mem::replace(&mut a, b))?; } Some(b) } }
After that final fix all of our tests pass:
$ cargo test
Finished test [unoptimized + debuginfo] target(s) in 52.06s
Running target/debug/deps/my_fibonacci-e9bfbebb80b3a5bf
running 1 test
test fibonacci_test ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Running target/debug/deps/fibonacci_test-e98d85aab754d963
running 1022 tests
...............................................................................
...............................................................................
...............................................................................
...............................................................................
...............................................................................
...............................................................................
...............................................................................
...............................................................................
...............................................................................
...............................................................................
...............................................................................
...............................................................................
..........................................................................
test result: ok. 1022 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Differential Oracle
We could also try to use the less-efficient, recursive method to check our implementation. It's easy to understand and implement:
#![allow(unused_variables)] fn main() { fn fibonacci_reccursive(n: u64) -> Option<u64> { match n { 0 => Some(0), 1 => Some(1), _ => fibonacci_reccursive(n - 1)?.checked_add(fibonacci_reccursive(n - 2)?), } } }
The problem with that approach is it ends up being way too slow for larger numbers, even in --release
mode.
Another option is to use a 3rd party implementation. Doing a quick search on crates.io results in a crate that implements the fibonacci sequence. There's also a problem with that: the crate actually has the same bug as our implementation. It skips the first two values in the sequence 0
and 1
.
Conclusion
The takeaway is some thought needs to go into how to test your implementation effectively. Often times, combining multiple approaches will provide the best result.
Developer documentation
bolero
is an open source project that welcomes external contributions.
One of the easiest ways to contribute is to report any issue
you encounter while using the tool.
If you want to contribute to its development, this chapter provides documentation that might be helpful for developers.
Build from source
In general, the following dependencies are required to build bolero
from source.
Dependencies
- The Rust toolchain (
cargo
,rustfmt
, etc.) installed via rustup make
bolero
has been tested in Ubuntu 22.04
and macOS 12
platforms.
Ubuntu 22.04
sudo apt update
sudo apt install binutils-dev libunwind-dev
make
comes pre-installed on Ubuntu, but if for some reason it isn't,
it can be installed using the command:
sudo apt install make
macOS 12
make
can be installed using the command:
xcode-select --install
No other dependencies are required.
Build and test
The Makefile
located in the
root directory can be used to build bolero
and run it on several test suites. To execute
it, just run:
make
This should compile bolero
and run multiple tests. In the process, it's
possible that you are shown the following message:
[-] Hmm, your system is configured to send core dump notifications to an
external utility. This will cause issues: there will be an extended delay
between stumbling upon a crash and having this information relayed to the
fuzzer via the standard waitpid() API.
To avoid having crashes misinterpreted as timeouts, please log in as root
and temporarily modify /proc/sys/kernel/core_pattern, like so:
echo core >/proc/sys/kernel/core_pattern
This message comes from AFL. You can either modify the file as indicated or
re-run the make
command as follows:
AFL_I_DONT_CARE_ABOUT_MISSING_CRASHES=1 make
However, this doesn't guarantee that the AFL tests will pass. In that case, the
best option is to temporarily modify the /proc/sys/kernel/core_pattern
file.