Introduction
This website contains teaching material for the laboratory sessions of the course Safe System Programming (in Rust) (SSP-RS), at Télécom Paris / Polytechnique Institute of Paris.
ⓒ 2022-2024 Samuel Tardieu and Stefano Zacchiroli
Lab 01 — Unsafe System Programming
In the first lab session of the course, you will mainly work on:
- Setting up your development environment for the rest of the course, and
- Experimenting with unsafe system programming, leading to buffer overflows and other nasty stuff.
Let's get to it.
Development setup
For the duration of the entire lab sessions of this course you are going to use a number of tools, for the most part related to development in the C and Rust programming languages. As the first task for this lab session you are hence going to install the relevant toolchains and a comfortable (for you) development environment.
Operating system requirements: we are going to assume that your work machine runs Linux as operating system (OS). If that is not the case, you should fix this before proceeding. Any Linux-based distribution would do, but in the following we are going to rely on package names from Debian and Debian-based distributions (e.g., Ubuntu); you should translate them to your distro if need be.
Work environment
You have various options in terms of which "machine" to work on:
- work on your real (hardware) machine: laptop or desktop computer;
- work in a Docker container;
- work in a virtual machine.
Option 1: Work on your machine
We recommend this option as it is the most flexible, but it requires more manual setup on your side—setup which we are not going to detail here as it heavily depends on your OS choice.
We only list below the tools (and corresponding Debian package names) that you should install on your machine to be able to complete the exercises of this course.
Option 2: Work in a Docker container
If you would like to work in a Docker container, we provide a Dockerfile that you can use to create locally a Docker image that contains all the tools used for lab sessions. Download the Dockerfile locally and build a matching image like this:
$ docker build -t net7212 - < Dockerfile
(It will take a few minutes to complete, depending mostly on download and I/O speeds.)
Then, create a shell script named run.sh
to easily create and run a fresh container on that image every time you need it.
It can be something like the following:
#!/bin/bash
host_labdir="${HOME}/net7212-lab"
cont_labdir="/home/net7212/lab"
docker run -ti \
--net host \
-e DISPLAY="unix${DISPLAY}" \
-v /dev/shm:/dev/shm \
-v /etc/hosts:/etc/hosts \
-v /etc/localtime:/etc/localtime:ro \
-v /tmp/.X11-unix:/tmp/.X11-unix \
-v /var/run/dbus/system_bus_socket:/var/run/dbus/system_bus_socket \
-v $HOME/.Xauthority:/home/net7212/.Xauthority \
-v "${host_labdir}:${cont_labdir}" \
--rm \
--name net7212 \
net7212
note that host_labdir
should point to a directory on your real machine, that will be mounted within the container so that you can share files between the two worlds.
All your lab session material will need to be in this directory if you want to access it from within the container.
You can then enter and use a fresh container every time you need it like this:
./run.sh
net7212@machine:~$ rustc --version
rustc 1.72.0 (5680fa18f 2023-08-23)
and that if you create a file in your home directory under ~/net7212-lab
you can also access it from within the container, e.g.:
# outside the container
$ echo "// hello.c" > ~/net7212-lab/hello.c
./run.sh
net7212@machine:~$ cat ~/lab/hello.c
// hello.c
net7212@machine:~$
Option 3: Work in a virtual machine
If you prefer to fully control your work environment (like with option (1)), but not impact your day-to-day work machine, you have the option to install a virtual machine and configure it manually. Just install in there the same tools that you would have installed, following the instructions below, on your real machine. We are not going to detail this step further, as it heavily depends on your choice of virtualization technology.
C development toolchain
If you have chosen to work on your (virtual) machine, you should install the following tools:
Tool | Debian package name |
---|---|
bats | bats |
check | check |
clang-tidy | clang-tidy |
clang | clang |
gcc | gcc |
gdb | gdb |
ltrace | ltrace |
make | make |
strace | strace |
valgrind | valgrind |
If you have chosen the Docker option, they are all already installed in the container.
Exercise 0.a: Install all the above dependencies in the work environment of your choice.
When ready, make sure you can compile and run the following Hello World program with both gcc
and clang
:
#include <stdio.h>
int main(void) {
printf("Hello, World!\n");
return 0;
} // end of hello.c
like this:
$ gcc -Wall hello.c
$ ./a.out
Hello, World!
$ clang -Wall hello.c
$ ./a.out
Hello, World!
Now make sure that all of the following commands execute properly without failing on your setup (the exact output might very depending on version, configuration, etc.; that's OK):
$ bats -v
Bats 1.8.2
$ clang-tidy --version
Debian LLVM version 14.0.6
Optimized build.
Default target: x86_64-pc-linux-gnu
Host CPU: skylake
$ gdb --version
GNU gdb (Debian 12.1-4) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
$ ltrace --version
ltrace version 0.7.3.
Copyright (C) 1997-2009 Juan Cespedes <cespedes@debian.org>.
This is free software; see the GNU General Public Licence
version 2 or later for copying conditions. There is NO warranty.
$ make --version
GNU Make 4.3
Built for x86_64-pc-linux-gnu
Copyright (C) 1988-2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
$ strace --version
strace -- version 5.10
Copyright (c) 1991-2020 The strace developers <https://strace.io>.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Optional features enabled: stack-trace=libunwind stack-demangle m32-mpers mx32-mpers
$ valgrind --version
valgrind-3.19.0
Rust development toolchain
If you have chosen to work on your (virtual) machine, you should install the following tools:
Tool | Debian package |
---|---|
rustup | none (install by hand) |
If you have chosen the Docker option, they are all already installed in the container.
Exercise 0.b: Install all the above dependencies in the work environment of your choice.
Once installed, verify that you can successfully run the following basic tools of the Rust toolchain, like this (your output might very, depending on the installed versions):
$ rustup --version
rustup 1.26.0 (5af9b9484 2023-04-05)
info: This is the version for the rustup toolchain manager, not the rustc compiler.
info: The currently active `rustc` version is `rustc 1.72.0 (5680fa18f 2023-08-23)`
$ cargo --version
cargo 1.72.0 (103a7ff2e 2023-08-15)
$ rustc --version
rustc 1.72.0 (5680fa18f 2023-08-23)
Editor / IDE
You are also going to need a development editor or an IDE (Integrated Development Environment). As developers tend to be opinionated about this stuff we are not going to force you to use any specific one: pick your own!
We recommend either a flexible editor with IDE-like features (e.g., Emacs or Vim with LSP support for Rust) or a full-featured IDE like Visual Studio Code or NetBeans (do make sure it has support for Rust, though, as that will be needed later in the semester).
Exercise 0.c: Install the editor or IDE you want in the work environment of your choice.
When done, open the hello.c
file you used previously and make sure you can comfortably edit/compile/run it, ideally from within the editor/IDE itself.
Done?
Good! You are now all set with the developer setup!
Let's now move to the other matters of the day.
Memory unsafety
Out-of-bounds write
Exercise 1.a: write C programs (new, different from those seen in lecture 01) that exhibit weakness CWE 787: Out-of-bounds write. Your goal is to make your program segfault due to this CWE! Try out-of-bound writes in various memory segments, writing one program for each of the following cases:
- out-of-bound write in dynamically allocated memory on the heap (e.g., with
malloc
), - out-of-bound write in statically allocated memory (i.e., global variable, preinitialized or not),
- out-of-bound write in automatic memory (i.e., variables allocated on the stack).
Bonus point: make sure your programs do not emit any warnings when compiled invoking the compiler with the --Wall
flag (which asks to enable all compile-time warnings and that you should always use anyway!).
Try to determine experimentally what each program was doing just before the segfault (tip: try with gdb, nm, ltrace, strace).
What helped you determine this?
Exercise 1.b: consider the following local variable declarations at the beginning of a function (possibly main
):
char s1[16];
char s2[16];
write a program that does not segfault, but that with a single memory (or string) copy operation that writes to only one of these variables (s1
, s2
) modifies the content of both variables, with data of your choice.
Verify the result experimentally (e.g., with printf
or strcmp
).
Warning: do not forget about string \0
terminators!
Out-of-bounds read
Exercise 1.c: continuing from the idea from the previous exercise, write a program that exhibits at least one variant of CWE 125: Out-of-bounds read. The program should not segfault, but by performing a memory ready operation on a given variable, it should be able to read the content of another variable, correctly and in full.
Bonus point: make your program read the content of an out-of-scope local variable, e.g., one declared in a calling function.
Smashing the stack
Segmentation fault
Considering the following program:
void function(int a, int b, int c) {
char buffer1[5];
char buffer2[10];
}
int main() {
function(1, 2, 3);
return 0;
}
Exercise 2.a: compile the program to assembly language like this:
$ gcc -S -o example1.s example1.c
and identify in the generated assembly code where function
is called. How are arguments passed to the function? (The answer might vary depending on the compiler and architecture, but the assembly code will tell you!)
Exercise 2.b: compile, execute, and debug the following program:
#include <string.h>
void function(char *str) {
char buffer[16];
strcpy(buffer, str);
}
int main() {
char large_string[256];
int i;
for(i = 0; i < 255; i++) {
large_string[i] = 'A';
}
function(large_string);
return 0;
}
Explain what's happening and why.
Subverting the control flow
Historically (on 32 bit architectures and with less built-in defenses in C compilers) subverting the control flow via stack overflows was pretty easy. You can get an idea of how easy it was by reading the famous tutorial article Smashing The Stack For Fun And Profit by Aleph One (1996).
Nowadays it has become a little bit more complicated, but still very doable (and regularly done!, as we have seen in the course examples).
Exercise 2.c: go through the modern incarnation of Aleph One's tutorial, namely: Smashing the Stack in the 21st Century by Jon Gjengset (archived copy), reproducing the reported experiences on your machine. (Next week we will go through some of the defenses that you will have to manually disable to exploit a stack overflow, and discuss what they are good for and what they are not.)
Lab 02 — Hardening System Programming
In this lab session we will take a practical tour of existing techniques to secure system programs implemented in traditional system languages. We will experiment with: testing, fuzzing, code instrumentation (of both binary and source code), and static analysis.
By the end of the session you will have a better understanding of the state-of-the-art of hardening system programming, including its limitations.
Credits
The exercises of this lab session contain material and ideas reused with permission from week 1 assignments of Stanford's course CS 110L (2022) by Ryan Eberhardt, Armin Namavari, Will Crichton, Julio Ballista, and Thea Rossman.
Unit testing
Setup
Consider the initial C implementation of a simple currency type (a pair of a currency name and a currency amount) given in money.h and money.c.
We want to reassure ourselves that the code thus far is correct, using unit testing.
In check_money_tiny.c you can find the skeleton of a C program that uses money.c
as a library and unit tests it with a single test case.
It uses the Check unit testing framework for C, which you should have already installed as part of the first lab session.
It can be built and run like this:
$ gcc -Wall -c -o money.o money.c
$ gcc -Wall -c -o check_money_tiny.o check_money_tiny.c
$ gcc -Wall -o check_money_tiny check_money_tiny.o money.o `pkg-config --libs check`
$ ./check_money_tiny
Running suite(s): Money
100%: Checks: 1, Failures: 0, Errors: 0
check_money_tiny.c:17:P:Core:test_money_create_amount:0: Passed
Exercise 1.a: read through the implementation of money.c
and check_money_tiny.c
to make sure you understand what the code does.
Help yourself with the Check documentation as needed.
Code coverage
The notion of code coverage is a measure of how much of your implementation code is exercised during the execution of your test suite. It can be measured in various ways, we will focus on the simplest metric: how many lines of code are (not) executed during test suite running (i.e., line coverage). You can verify the line coverage of your current test suite using gcov, which is part of gcc by:
- compile/link your project passing the
-fprofile-arcs -ftest-coverage
options to compiler and linker, - execute your test suite (
./check_money_tiny
in this case), - inspect code coverage of a source code file you care about, e.g.:
$ gcov money.c
File 'money.c'
Lines executed:75.00% of 16
Creating 'money.c.gcov'
Lines executed:75.00% of 16
The summary tells you the percentage of lines of code exercised during execution.
The mentioned file money.c.gcov
is an annotated version of money.c
that shows precisely which lines have been executed (or not).
Exercise 1.b: measure the code coverage of the current test suite for money.c
.
Then, add all the test cases you need to maximize code coverage.
Can you reach 100% line coverage?
Why or why not?
Exercise 1.c: (optional and tricky, feel free to postpone this one to the end of the session) add a memory unsafety issue to the money.c
implementation, and specifically one that crashes the execution of your test case.
For example, you can implement a new money_add
method that performs an out-of-bound read or write.
Then add a test that run the buggy code, crashing the execution of the test case.
Does the test case crash cause the crash of the entire test suite? (i.e., do other test cases supposed to run after it keep running or not?)
Why or why not?
Stack overflow detection
Setup
Retrieve the program uppercase.c, compile it (recommended compiler options for this session are: -g -O0 -Wall -Wextra
), and run it, e.g., like this:
$ gcc -g -O0 -Wall -Wextra -o uppercase uppercase.c
$ ./uppercase 'Hello, World!'
HELLO, WORLD!
Now take some time to study the code of the program.
Exercise 2.a: the program is affected by a memory-related bug (EEEK!). Explain where the bug is, what its consequences might be, and how to fix it.
Static analysis with Clang-Tidy
Clang-Tidy (which you have installed already, as part of last week's lab session) performs some simple static analysis checks, ranging from basic linting to more advanced techniques. From the tool documentation:
clang-tidy is a clang-based C++ “linter” tool. Its purpose is to provide an extensible framework for diagnosing and fixing typical programming errors, like style violations, interface misuse, or bugs that can be deduced via static analysis.
Exercise 2.b: go (briefly) through the documentation of clang-tidy
and learn how to use it to analyze C source code files.
Then use clang-tidy
on uppercase.c
to check if it detects the bug you have identified in the previous exercise.
Does clang-tidy
catch the problem?
Based on what you have learned in last lecture, explain why clang-tidy
is capable to identify the problem or why it isn't.
Dynamic analysis with Valgrind
Clang-Tidy performs static analysis, Valgrind on the other hand performs dynamic analysis, based on dynamic binary re-compilation, as we have seen in last lecture. Let's see how Valgrind fares in detecting the bug with this program. After compiling your program, we can give it a try like this:
$ valgrind [--tool=memcheck] ./uppercase 'Hello, World!'
(Note: --tool=memcheck
is optional because Memcheck is the default tool that Valgrind uses unless otherwise requested.)
Exercise 2.c: does Valgrind detect the issue? Based on what you have learned in last lecture, explain why or why not.
Exercise 2.d: benchmark the time required to execute the program with and without Valgrind.
(Tip: as a simple way to do that you can use the standard UNIX time
command.
If you want to be fancier, this is your chance to learn about the amazing benchmarking tool Hyperfine and use it for a more scientifically accurate comparison.)
Dynamic analysis with LLVM sanitizers
As we have seen in last lecture, dynamic analysis can also be performed via source code instrumentation, which happens at compile time to add runtime checks. As we have seen LLVM sanitizers (look for the various entries with "Sanitizer" in their name in the Clang documentation) are a set of plugins for the Clang compiler that do that.
Exercise 2.e: change the compiler you are using to build from gcc
to clang
and first rebuild the program with it. (You can pass the same options suggested above, as they are compatible between the two compilers.)
Then, add all of the following sanitizers (check the documentation for how to do that) and rebuild the program: AddressSanitizer, LeakSanitizer, UndefinedBehaviorSanitizer.
Compare the size of the executables built with and without sanitizers. How different are they and why?
Exercise 2.f: run the instrumented program. Do sanitizers detect the issue? Based on what you have learned in last lecture, explain why or why not.
Let's streamline our build process a bit, as it is becoming a bit of a mess.
Retrieve the generic Makefile we make available for building the programs associated to this lab session.
If you are not familiar with Makefiles, go (briefly) through the GNU make documentation to make sure you know the basics.
In particular, read through the introductory material and make sure you understand the parts of Makefile syntax used in the given Makefile
, in particular: rules, variables, and static pattern rules.
(If you know about Makefiles already, feel free to skip this documentation study part.)
Exercise 2.g: change the retrieved Makefile so that:
- it only builds the
uppercase.c
programs (other programs mentioned in there will arrive later in this lab session!), and - passes the sanitizer flags to the compiler when compiling.
Recheck the result of the previous exercise about sanitizers to make sure they are there when building with make
.
Memory leak detection
Linked lists
Now retrieve the program linkedlist.c, add it to the Makefile, and compile it as before.
Take some time to study the program and understand what it does.
Exercise 3.a: this program is affected by (at least) two memory-related bugs this time, but of a different nature than before. Identify both of them, explain what the consequences might be, and how to fix them.
Exercise 3.b: try linting/static analysis with Clang-Tidy on this program, as you did before.
Does it detect the (right) issues? (Tip: beware of false positives!)
Exercise 3.c: try source code instrumentation with LLVM sanitizers on this program, as you did before.
Do they detect the issues?
In both cases feel free to speculate about why the used approaches did or did not identify the memory issues affecting this program.
String parsing
You know the drill!
Next program is bracket-parser.c, that extracts a "substring [within brackets]"
from a larger string.
Exercise 4.a: read the program, find the memory problem with it, try Clang-Tidy and LLVM sanitizers on it, explaining what you are seeing.
Exercise 4.b: on most inputs the LLVM sanitizers will not detect a memory issue.
Based on your understanding of the bug and your reading of the code (= white-box inspection), can you craft a specific input for bracket-parser
that will make the LLVM sanitizer spot the bug at runtime?
(And while we are at it: what does this tell you about the reliability of dynamic analysis to detect memory issues?)
Fuzzing with LibFuzzer
Let's now see if fuzzing can help us finding automatically both the memory issue affecting bracket-parser
and a test input that proves the issue is there.
Read the introductory documentation of LibFuzzer.
Make sure to understand how to add fuzz targets to your code, what the required API is, and the fact you should not have a main
function when using LibFuzzer (because it adds its own main
).
Exercise 4.c: add a fuzz target to your code, that allows you to fuzz the input of the parse
function of bracket-parser.c
.
Tip: you can take inspiration from the example seen in lecture, but remember that in this case you should pass well-formed C strings (trailing \0
!) to avoid triggering bugs other than the one you are looking for.
Exercise 4.d: build the fuzzer (adding fuzzer
to the list of -fsanitize=...
sanitizers) and run it.
Does it find the issue?
Does it produce a test case with a sample input that proves the issue is there?
Explain why.
(Pretty cool, huh?)
Lab 03: Rust and references
In this lab session you will build and run your first Rust program. You will manipulate references and objects lifetimes.
By the end of the session you will have a better understanding of the lifecycle of Rust objects.
Your first Rust program
In this section, you will learn how to create and compile your first Rust program using tools such as cargo
.
Anatomy of a Rust program
A basic Rust program is organized as follows:
Cargo.toml
Cargo.lock
src
└── main.rs
target
└── debug
└── myprogram
- The file
Cargo.toml
describes your program (name, version, authors, …). - The file
Cargo.lock
contains information about the version of the other packages used when compiling your program. src
contains the source of your program —main.rs
is the file containing themain
function.target
is a directory created by the compiler; it will contain the output of the compilation, including the executable produced from your source files, indebug/myprogram
(ifmyprogram
is the name of your program as described inCargo.toml
). Thetarget
directory should never be checked in your version control software. It should be ignored completely (e.g., by adding it to your.gitignore
if you are usinggit
).
Creating your first Rust program: intro
Exercise 0.a: Create the skeleton of your first Rust program by using the command cargo new
:
$ cargo new intro
Created binary (application) `intro` package
This will create a intro
directory with a complete skeleton of a Rust program:
$ tree intro
intro
├── Cargo.toml
└── src
└── main.rs
(you might not have the tree
on your computer – never mind, we will not need it)
Change your current directory to intro
and display the content of main.rs
:
$ cd intro
$ cat src/main.rs
fn main() {
println!("Hello, world!");
}
As you can see, the program is pretty basic. You can compile it by using cargo build
:
$ cargo build
Compiling intro v0.1.0 (/tmp/intro)
Finished dev [unoptimized + debuginfo] target(s) in 0.54s
Your program has compiled without errors. You can execute it by using cargo run
:
$ cargo run
Finished dev [unoptimized + debuginfo] target(s) in 0.00s
Running `target/debug/intro`
Hello, world!
Note that you can execute cargo run
without first executing cargo build
: the application will be built automatically.
Editing the Rust program
Exercise 0.b: Using your editor of choice (for example Visual Studio code or NetBeans), edit main.rs
so that the main program now displays a count from 1 to 5 inclusive:
fn main() { for i in 1..=5 { println!("i = {i}"); } }
Ensure that your editor is Rust-aware, or install the necessary extensions if needed. For Visual Studio Code, we suggest the installation of the "rust-analyzer" extension.
You can now execute your code. If everything went well, you should see the following output:
$ cargo run
Compiling intro v0.1.0 (/tmp/intro)
Finished dev [unoptimized + debuginfo] target(s) in 0.20s
Running `target/debug/intro`
i = 1
i = 2
i = 3
i = 4
i = 5
You can also compile your program and run the executable directly. As cargo run
output indicates it is located in target/debug/intro
:
$ cargo build
Finished dev [unoptimized + debuginfo] target(s) in 0.00s
$ ls -l target/debug/intro
-rwxr-xr-x 2 sam sam 4203904 Jan 16 15:02 target/debug/intro
$ target/debug/intro
i = 1
i = 2
i = 3
i = 4
i = 5
Using a Rust tutorial
There exists an interactive tutorial at https://rust-book.cs.brown.edu/. While reading this tutorial, you will be able to take quizzes and to let annotations that can be re-read later on.
You will not have to read the full tutorial at this time. We suggest that you ensure that you understand the concepts (and pass the quizzes) in the following chapters:
- 3.1 Variables and Mutability
- 3.3 Functions
- 4.1 What is Ownership?
- 4.2 References and Borrowing
- 4.3 The Slice Type
Of course, you are invited to read more chapters and take more quizzes as the lecturers describe new Rust concepts.
If you want to do more
Another way to familiarize yourself with Rust programming is to use the rustlings repository and program. Start by installing the rustings
program and clone the rustlings
repository containing the exercises:
$ cargo install rustlings
$ git clone https://github.com/rust-lang/rustlings/
$ cd rustlings
$ rustlings watch
The source for this exercise is in `exercises/intro/intro1.rs`. Have a look!
Going forward, the source of the exercises will always be in the success/failure output.
====================
You can keep working on this exercise,
or jump into the next one by removing the `I AM NOT DONE` comment
[…waits forever…]
The latest command rustlings watch
will watch your progress while editing the exercises in the exercises
directory of the repository.
For example, following the instructions given by rustlings watch
, edit the file exercises/intro/intro1.rs
with your favourite editor. As an exception, this exercise already compiles succesfully. All you have to do is remove the line // I AM NOT DONE
so that rustlings watch
will go to the next exercise in the series, exercises/intro/intro2.rs
.
This one will not compile. You have to edit the code so that it compiles successfully, then when you see that it does you can remove the // I AM NOT DONE
line, and so on.
You can stop doing the exercises whenever you want (just stop the rustlings watch
program). When you run rustlings watch
again it will start from where you left off.
References
Mutable references
The function std::mem::swap()
takes two mutable references on the same type as parameters and swap them.
Swapping values
Exercise 1.a: Complete the following function (and call it from main()
to see its effect) so that it has the expected effect:
#![allow(unused)] fn main() { fn do_swap() { let mut a = String::from("apple"); let mut b = String::from("bread"); println!("Before swap: a = {a}, b = {b}"); todo!(); // REPLACE ME with something using std::mem::swap() println!("After swap: a = {a}, b = {b}"); } }
The output should look like:
Before swap: a = apple, b = bread
After swap: a = bread, b = apple
You may realize that swapping two String
values do not move the various characters composing the strings in the heap. Only the three fields making up a String
(capacity, pointer, and length) are swapped.
Smallest first
Exercise 1.b: Write a function "smallest_first" with the following prototype which ensures that its first argument is smaller than its second argument and swap them otherwise:
fn smallest_first(a: &mut i32, b: &mut i32) { todo!(); // REPLACE ME } fn main() { let mut a = 42; let mut b = 10; println!("Before calling smallest_first: a = {a}, b = {b}"); smallest_first(&mut a, &mut b); println!("After calling smallest_first: a = {a}, b = {b}"); smallest_first(&mut a, &mut b); println!("After calling smallest_first again: a = {a}, b = {b}"); }
should display
Before calling smallest first: a = 42, b = 10
After calling smallest first: a = 10, b = 42
After calling smallest first again: a = 10, b = 42
Lifetimes
Sorting an immutable collection of strings
Let us assume that we want to write a function which sorts an immutable collection of String
elements. Here, "immutable" means that we cannot modify either the strings or the collection itself. We will have to return a Vec
containing the result.
Exercise 2.a: write a function sorted
which takes a reference to a slice of String
instances and return a Vec<String>
with the sorted strings.
The function will have the following signature:
#![allow(unused)] fn main() { fn sorted(names: &[String]) -> Vec<String> { todo!() // Write the code here } }
You can use the following constructs to help you to so:
vec![]
creates an emptyVec
(the type of the elements will be deduced from the context).- If
s
is aString
(or a&String
),s.clone()
will produce a newString
with the same content ass
. - You can iterate over the content of a slice
names
of type&[String]
withfor name in names { … }
.name
will have type&String
within the loop. - You can use
.sort()
on a mutable slice: it will sort your elements in the natural order.
You can test your function with the following main
program:
fn main() { let people = [String::from("John Doe"), String::from("Jane Eyre"), String::from("Jane Doe")]; let in_order = sorted(&people); for n in in_order { println!("- {n}"); } }
This should print "Jane Doe", "Jane Eyre", and "John Doe" in order.
Avoiding cloning the strings
While the previous program works, you had to clone the immutable strings. This is unfortunate as this consumes more memory. Why not work directly with a Vec
of &String
and sort it?
Exercise 2.b: modify your sorted
function so that it returns a Vec<&String>
instead of a Vec<String>
.
Note that .sort()
will work on a collection of &String
: it will sort them the same way it sorted String
elements.
Testing the guarantees of lifetimes
The signature of your sorted
function is:
#![allow(unused)] fn main() { fn sorted(names: &[String]) -> Vec<&String> }
Since you have not indicated any explicit lifetime (remember, lifetimes start with a single tick character '
), lifetimes elision kicked in: the lifetime of the &[String]
slice was copied over the &String
elements inside the Vec
.
It means that the result of sorted()
cannot live longer than its argument names
.
Exercise 2.c: experiment with the following modified main program, which will not compile, and make sure you understand why the compiler is telling you that you cannot store the output of sorted(&people)
into in_order
. Note how it prevents you from referencing strings that have been destroyed when people
went out of scope. You would not have had any such guarantee in C.
fn main() { let in_order = { let people = [String::from("John Doe"), String::from("Jane Eyre"), String::from("Jane Doe")]; sorted(&people) }; for n in in_order { println!("- {n}"); } }
Lifetimes: a complex case
Warning: in this page, we will touch several Rust areas that have not yet been explained in class. Do not hesitate to read the Rust documentation or to take things as granted for now.
Exercise 3.a: follow the code described below, implement it, make sure you understand how it works.
Problem statement
Sometimes, we would like to manipulate a string, for example to make it lowercase. However, the string we manipulate might already be in lowercase: in this situation, we would like to avoid copying the string and return a reference to the string we got as a parameter.
In short, we will need a type which can hold:
- either a proper
String
if we had to build it; - or a reference to another string (as a
&str
) if we could just reuse an existing one.
Building such a type
We will build a type, named StringOrRef
, which can either hold an owned String
or store a borrowed &str
reference.
Enumerated types
Rust has support for enumerated types with values, which means that we can write something like (incorrect for now):
#![allow(unused)] fn main() { pub enum StringOrRef { Owned(String), Borrowed(&str), } }
A StringOrRef
object will be either a StringOrRef::Owned
or a StringOrRef::Borrowed
variant. When such an object is destroyed, it will destroy its content:
- If it is a
StringOrRef::Owned
, it will destroy theString
it owned (and thus free the heap memory associated with theString
). - If it is a
StringOrRef::Borrowed
, it will destroy the&str
it owned, which does nothing since destroying a reference does not destroy the object it points to (as the reference does not own it).
Fixing the type
Our type will not compile because we haven't told the compiler how long the reference in StringOrRef::Borrowed
is supposed to be valid. Since we do not know that ourselves in advance, we have to make it a generic parameter of the type:
#![allow(unused)] fn main() { pub enum StringOrRef<'a> { Owned(String), Borrowed(&'a str), } }
We say that the type StringOrRef
is parameterized by the generic lifetime parameter 'a
.
We can create an owned value by using StringOrRef::Owned(String::from("Hello"))
, or make a reference to a string s
with StringOrRef::Borrowed(&s)
. In the former case, the lifetime 'a
can be anything since the StringOrRef
object owns the string. In the later case, 'a
will be set to the lifetime of the referenced string s
:
Important note: when a type is parameterized by a generic lifetime parameter, an object of this type can never live longer than this lifetime. For example, if s
has a lifetime 's
, StringOrRef::Borrowed(s)
is an object that cannot live longer than 's
. This is intuitively sound: since we store a reference to s
(wich has lifetime 's
) inside our StringOrRef
, the StringOrRef
cannot survive the disappearance of s
as that would leave us with a dangling pointer.
Exploring the type
Our type can be used through pattern-matching:
#![allow(unused)] fn main() { fn display(x: &StringOrRef<'_>) { // '_ means that the lifetime has no importance here match x { StringOrRef::Owned(s) => println!("owned string: {s}"), StringOrRef::Borrowed(s) => println!("borrowed string: {s}"), } } }
We can also write a function which returns a &str
from our object:
#![allow(unused)] fn main() { pub fn as_str<'a>(x: &'a StringOrRef<'_>) -> &'a str { match x { StringOrRef::Owned(s) => &s, StringOrRef::Borrowed(s) => s, } } }
Note how we didn't have to give the lifetime of the StringOrRef
generic parameter 'a
and used '_
which means "any lifetime": since the StringOrRef
reference has a lifetime of 'a
which is necessarily shorter or equal than the generic lifetime parameter (see the "Important note" above), we now that the returned reference is shorter than the one used as a generic parameter.
Implementing as_str()
as a method
Rather than using a standalone function, we can implement as_str()
as a method on StringOrRef
objects. Methods are implemented in a impl
block. In an impl
block, Self
designates the type itself. In methods parameters, self
in first position designates receiving the current object (it is a shortcut for self: Self
), &self
is a shortcut for self: &Self
and &mut self
is a shortcut for self: &mut Self
.
Let us rewrite as_str()
as a method:
#![allow(unused)] fn main() { impl StringOrRef<'_> { pub fn as_str(&self) -> &str { match self { StringOrRef::Owned(s) => &s, StringOrRef::Borrowed(s) => s, } } } }
You can note some interesting points about lifetimes:
- We used
<'_>
in ourimpl
block: our method defined in this block works with any generic lifetime parameter, as explained below. - We didn't explicitely write in the
as_str()
signature that the returned&str
has the same lifetime as&self
. This mecanism is called "lifetime elision": when a method has a&self
parameter, by default all outputs lifetime which are not explicit will have the same lifetime as&self
. This is a shortcut forpub fn as_str<'a>(&'a self) -> &'a str
.
Using or StringOrRef
type
We can now use our StringOrRef
type. For example, let us write a function which returns a lowercase version of a string, but allocates memory on the heap only when the string is not lowercase already:
#![allow(unused)] fn main() { // The lifetime of s will be copied into the generic lifetime parameter of StringOrRef. // Again, this is because of elision rules: if there is only one lifetime parameter in // the input, it will be copied into all non-explicit lifetime parameters in the output. pub fn to_lowercase(s: &str) -> StringOrRef<'_> { if s.chars().all(|c| c.is_lowercase()) { // All characters in the string are lowercase already, return a reference StringOrRef::Borrowed(s) } else { // We need to create a new String with a lowercase version StringOrRef::Owned(s.to_lowercase()) } } }
We can now use it in our main program and see that it works:
fn main() { let s1 = to_lowercase("HeLlO"); let s2 = to_lowercase("world"); println!("s1 = {}, s2 = {}", s1.as_str(), s2.as_str()); }
This will display "s1 = hello, s2 = world". Nothing indicates that "world" has not been copied. Let's enhance the program with the matches!
macro which can test if some expression matches a pattern, as in a match
expression:
fn variant(x: &StringOrRef<'_>) -> &'static str { if matches!(x, StringOrRef::Owned(_)) { "owned" } else { "borrowed" } } fn main() { let s1 = to_lowercase("HeLlO"); let s2 = to_lowercase("world"); println!("s1 = {}, s2 = {}", s1.as_str(), s2.as_str()); println!("s1 is {}, s2 is {}", variant(&s1), variant(&s2)); }
The output is now
s1 = hello, s2 = world
s1 is owned, s2 is borrowed
as expected. Neat eh?
Adding a destructor
When an StringOrRef
object is dropped (goes out of scope), it will get destroyed: the destructor for every field will be called (if any). For example, if it holds a StringOrRef::Owned
variant, the String
contained in this variant will be dropped and its destructor will be called, freeing memory on the heap.
We can visualize what happens by adding a destructor on StringOrRef
. It is done by implementing the Drop
trait:
#![allow(unused)] fn main() { impl Drop for StringOrRef<'_> { fn drop(&mut self) { print!( "Destroying the StringOrRef containing {} which is {}: ", self.as_str(), variant(self), ); if matches!(self, StringOrRef::Owned(_)) { println!("memory on the heap will be freed"); } else { // Dropping a reference doesn't free memory on the heap println!("no memory on the heap will be freed"); } } } }
If we execute our program, we will now read:
s1 = hello, s2 = world
s1 is owned, s2 is borrowed
Destroying the StringOrRef containing world which is borrowed: no memory on the heap will be freed
Destroying the StringOrRef containing hello which is owned: memory on the heap will be freed
s2
and s1
are destroyed in the reverse order of their creation when they go out of scope. No memory on the heap was ever allocated for string "world", which comes from a read-only memory area and has only been referenced. However, the string "hello" has been built into the heap while lowercasing the string "HeLlO" and needs to be freed: this
happens automatically when dropping s1
.
Conclusion
Types in Rust are powerful and allow easy memory management without needing a garbage collector. Doing the same thing in C would require extra fields (is the string owner or borrowed, code the deallocation by hand). We will later see even more powerful type manipulation in Rust.
Lab 04: Error handling
In this lab session you will:
- manipulate APIs that can return errors, such as network or disk ones;
- use external crates (i.e. Rust libraries);
- write regular expressions;
- determine which French people are trending in the news;
- display errors with colors.
By the end of the session you will have a better understanding of the way errors are typically handled in Rust.
Web page retrieval
In this part, you will have to retrieve a web page from a given URL and return its content as a String
or return a proper error if the page retrieval fails.
Initialization
But first of all, ensure that your Rust installation is up-to-date by typing:
$ rustup update
If you get an error, it means that you have not installed Rust through rustup
. In this case, make sure that your system is up-to-date.
For this lab, you will create a new Rust program named "lab4". Create a new binary (executable) project by typing
$ cargo new --bin lab4
$ cd lab4
Everything you will do today will happen in the lab4
directory, in particular in the src/main.rs
file.
Adding a dependency
You could make HTTP requests "by hand" by opening a TCP socket to the right host and port, and by sending HTTP commands and parsing the responses. However, you can take advantage of existing libraries written in Rust that can do that already.
The library ("crate" in Rust terminology) you will want to use is called reqwest (this is not a typo). In order to use it in your program, you have to add it to Cargo.toml
in the [dependencies]
section.
Rather than adding it by hand, you can use the cargo add
command. Moreover, we want to use the "blocking" API of reqwest which is the simplest one at this stage. This "blocking" API is enabled by requesting (ah ah) the "blocking" feature or reqwest:
$ cargo add reqwest --features blocking
If you look at your Cargo.toml
, you should see something like:
[dependencies]
reqwest = { version = "0.12.8", features = ["blocking"] }
It indicates that your Rust program will use version 0.12.8 of the "reqwest" crate with the "blocking" feature enabled. 0.12.8 is the latest published version of the crate at the time this page has been written.
Adding the "reqwest" crate as a dependency means that you will be able to use its types and functions by prefixing them with reqwest::
.
⚠️ If you later receive a compilation error, you might need to install the
pkg-config
andlibssl-dev
packages (or equivalent) on your system using your package manager, so thatreqwest
can find the library it needs (namely, OpenSSL for cryptographic routines required forhttps://
handling).
Fetching a web page
Your first function will retrieve a web page from its URL using the blocking API of reqwest, whose documentation is accessible online.
Exercise 1.a: write a function with signature fn get(url: &str) -> Result<String, reqwest::Error>
which returns the content of the web page located at url
(use a code similar to the one in the documention).
Use the following main()
program to test it:
fn main() -> Result<(), reqwest::Error> { println!("{}", get("https://rfc1149.net/")?); Ok(()) }
Note how main()
can return a Result<(), E>
instead of returning nothing (which is written ()
in Rust and is the equivalent of void
in C-like languages). Either get("https://rfc1149.net/")
returns an error and it will be propagated to main()
by the ?
operator, or it returns a String
which will be displayed. At the end of the main()
program, Ok(())
ensures that ()
is returned in an Ok
.
Returning a better error
Now, try changing the URL with one returning a 404 (not found) code. You can use "https://rfc1149.net/nonexistent"
, which does not exist.
Note how your get()
function returns without an error: it returns the content of the error page. This is not a good idea: in your program, you only want to return pages which were successfully found.
However, you will not be able to return a reqwest::Error
to indicate that the page was not found, as it is not an existing error condition for reqwest::Error
. You will have to write your own Error
type with two variants for now:
#![allow(unused)] fn main() { #[derive(Debug)] enum Error { Reqwest(reqwest::Error), BadHttpResult(u16), } }
This Error
type can be a Error::Reqwest
and encapsulate a reqwest::Error
, or it can be a Error::BadHttpResult
and encapsulate the HTTP error code returned by the web server (for example 404 for "not found").
The #[derive(Debug)]
will be explained in a later class. For the time being, you just need to know that it will allow an Error
object to be displayed by the main()
program if needed. Also, you can display an Error
yourself by using the {:?}
placeholder:
#![allow(unused)] fn main() { let my_error = Error::BadHttpResult(404); println!("The error is {my_error:?}."); }
would display
The error is Error::BadHttpResult(404).
Also, in order to take advantage of the automatic conversion performed by the ?
operator, you want to implement From<reqwest::Error>
for your Error
type:
#![allow(unused)] fn main() { impl From<reqwest::Error> for Error { fn from(e: reqwest::Error) -> Error { // Encapsulate the error into a Error::Reqwest variant Error::Reqwest(e) } } }
Exercise 1.b: update your get()
function so that it checks the status code of the response before reading its text. Your get()
function will return a Result<String, Error>
, and your main()
function will return a Result<(), Error>
, in order to accomodate both error conditions.
Look at the documentation for the reqwest::blocking::get()
method: what is its return type? What method can be called on a Response
to get a StatusCode
and compare it with StatusCode::Ok
? How can you get the numerical u16
code of a StatusCode
?
Note: instead of typing qualified type names such as reqwest::StatusCode
, you can add a use reqwest::StatusCode;
at the beginning of your program: this will import StatusCode
in your namespace, and you will be able to use StatusCode
instead of the longer reqwest::StatusCode
.
Check that your new version of get()
works by ensuring that an error is displayed when trying to print the content of "https://rfc1149.net/nonexistent"
. You should see a 404 error.
Name identification
Now that you are able to retrieve a web page, you will try to identify names written in it. A name looks like "John Doe": an uppercase letter followed by lowercase letters followed by a space followed by an uppercase letter followed by lowercase letters.
Rather than looking for such a pattern yourself, you might want to use the "regex" crate which implements regular expressions.
Using the "regex" crate
Add the "regex" crate to your Cargo.toml
:
$ cargo add regex
From now on, you are able to use the entities defined in this crate, in particular the regex::Regex
type. To be able to refer it as Regex
instead of regex::Regex
, you might want to put a use regex::Regex;
near the beginning of your program.
Of course we want our regular expression to match French names containing accented characters. You can use the following regular expression patterns to identify uppercase and lowercase letters:
\p{Uppercase}
will match any Unicode uppercase letter (for example it would match the Greek delta letter "Δ")\p{Lowercase}
will match any Unicode lowercase letter, such as the Greek delta letter "δ"+
after a pattern means "one or more"
You can then deduce that a name component can be represented as \p{Uppercase}\p{Lowercase}+
. This would match "John", "Doe", "François" or "Strauß". The whole regular expression can then be \p{Uppercase}\p{Lowercase}+ \p{Uppercase}\p{Lowercase}+
(two components separated by a space).
Creating the Regex
A regular expression is created using Regex::new()
, which returns a Result
. Since we know that our expression is valid, we can call unwrap()
on the result:
#![allow(unused)] fn main() { let re = Regex::new("\p{Uppercase}\p{Lowercase}+ \p{Uppercase}\p{Lowercase}+").unwrap(); }
However, it you do this, you will likely get an error: \p
in the string is interpreted as an escaped p
. As \n
represents a "line feed", \p
represents… nothing known. This is an invalid escape sequence in a string.
Fortunately, Rust has raw strings, in which no escape character is recognized. The syntax of a raw string is r"…"
.
Also, you can also put double quotes in a raw string if you need it, by using a pound sign to the double quote delimeters:
#![allow(unused)] fn main() { let s = r#"This is a string with some "quotes" in it"#; }
But what if you need to put a double quote followed by a pound sign ("#
) in the raw string? This is easy, you can change the start and end marker and increase the number of pound signs provided they match:
#![allow(unused)] fn main() { let s = r###"You need a quote + 3 pound signs to end the string"###; let t = r###"You can put "## inside without any problem!"###; }
Writing the extract_names()
function
Using the .find_iter()
method on a regex::Regex
object, you can iterate over the matches found in the input string as shown in the following example which displays all sequences of uppercase characters (with at least two of them) found in the file "text.txt":
fn main() { // Read the content of the "text.txt" file into variable s let s: String = std::fs::read_to_string("text.txt").unwrap(); // Match at least two consecutive uppercase character let re = regex::Regex::new(r#"\p{Uppercase}{2,}"#).unwrap(); println!("All uppercase sequences found:"); for m in re.find_iter(&s) { // Inside the loop m is a regex::Match, which as a .as_str() method println!(" - {}", m.as_str()); } }
Exercise 2.a: write a function with signature fn extract_names(s: &str) -> Vec<String>
which returns all plausible names in a string.
In this function, you will (those are mere suggestions, you are free to do it any other way you see fit):
- create a
re
variable containing aRegex
with the pattern seen above - create an empty vector (using
vec![]
) and store it into a mutable variable (this is the vector you will return at the end of the function) - iterate over
re.find_iter()
to find all the matches - on every match
m
you can call.as_str()
to get a&str
representing the text of the match (for example "John Doe"), that you can transform into a string using.to_owned()
orString::from()
(as usual) - push the
String
in theVec<String>
that you plan to return (.push()
is the method you want to use) - return the vector
Try your function with the following main()
function:
fn main() { let names = extract_names("Yesterday, John Doe met François Grüß in a tavern"); println!("{names:?}"); }
Since String
implements Debug
, Vec<String>
also implements Debug
and can be displayed using the {:?}
placeholder which is handy for debugging.
Deduplicating the names
Unfortunately, we may end up with duplicates. If a text contains the same name several times, it will be present several times in the output.
A vector is not really the best structure to represent a set of objects, like a set of names. We do not care about the order, only about the presence of a name.
Rust has a std::collections::HashSet<T>
type in its standard library (std
). Provided that you added a use std::collections::HashSet;
near the beginning of your program, you can:
- create a new
HashSet<T>
withHashSet::new()
- insert an element in a set
h
withh.insert(element)
; nothing happens if the element is already present in the set - iterate over the elements of a set
h
like you did with a vector:for element in h { … }
(ifh
is aHashSet<T>
,element
will be a&T
in the loop) - display a
HashSet<T>
using{:?}
as long as its element typeT
implementsDebug
Exercise 2.b: make your extract_names()
function return a HashSet<String>
(instead of a Vec<String>
).
Using the following main()
function, you will see that there can be no duplicates:
fn main() { let names = extract_names("John Doe, François Grüß, John Doe"); println!("{names:?}"); }
Returning the names in a page
Exercise 2.c: add a function with signature fn names(url: &str) -> Result<HashSet<String>, Error>
which returns all plausible names in a web page.
Check your function by displaying the names present on "https://www.liberation.fr"
.
Extra: if you have more time
If you have more time, you can fix the name detector such that it accepts multipart names, such as "Jean-Pierre Elkabbach".
Sets intersection
If you have two HashSet<T>
h1
and h2
, you can iterate over their intersection which is denoted by h1.intersection(&h2)
. This will iterate over all common elements using a reference of type &T
.
Exercise 3.a: write a function with signature fn trending() -> Result<Vec<String>, Error>
which returns a vector with the names present in both "https://www.liberation.fr"
and "https://www.mediapart.fr"
. The vector will be sorted before being returned.
Print the trending names from your main()
function:
fn main() -> Result<(), Error> { println!("Trending names:"); for name in trending()? { println!(" - {name}"); } Ok(()) }
File system
We want to store the list of trending people in a file on a disk.
Exercise 4.a: write a function with signature fn write_to_disk(path: &str, data: Vec<String>) -> Result<(), io::Error>
which writes every string in data
on its own line in file located at path
.
The method join("\n")
applied to a Vec<String>
will return a new String
with the \n
character (line feed) inserted between each line. Do not forget to add an extra '\n' character at the end of the returned string.
To write a string content
into a file file_name
, you can use std::fs::write(file_name, content)
which returns a Result<(), std::io::Error>
.
Exercise 4.b: modify your main()
function such that you write the trending names into a file named "trending.txt"
.
You will have to add a new variant to your Error
enum in order to encapsulate the std::io::Error
if there is a problem when writing a file.
Check that the error detection and reporting works fine by attempting to write in a file you cannot write to (such as "/trending.txt"
, you most likely don't have the rights to write at the root of your file system).
Easier errors
You may have noticed that it is rather cumbersome to write all those From
methods to convert existing errors into our own variant. Fortunately, a crate exists to make this job easier: thiserror.
After adding thiserror
to your Cargo.toml
$ cargo add thiserror
you will be able to define your Error
type as follows:
#![allow(unused)] fn main() { #[derive(Debug, thiserror::Error)] enum Error { #[error("Wrong HTTP status code: {0}")] BadHttpResult(u16), #[error(transparent)] Io(#[from] std::io::Error), #[error(transparent)] ReqwestError(#[from] reqwest::Error), } }
Our Error
type looks the same as before with a few additions:
- It derives
thiserror::Error
. This is the heart of thethiserror
crate and makes the other things below work. Behind the scenes, it uses "procedural macros" which is a way to transform some Rust code into another Rust code during the compilation. - The
#[error]
attribute indicates how to display the error.{0}
refers to the first field. The fields could also be named, a fieldxyz
would use{xyz}
in the#[error]
attribute. If you are encapsulating another error, you may usetransparent
to indicate that you simply want to display the inner error, and that you have nothing to add. - The
#[from]
attribute on the field of a variant indicates that aFrom
implementation should be generated from the type following#[from]
to anError
. - Your
Error
type will implement thestd::error::Error
trait which allows proper error chaining: the system will know (and will be able to display) that the cause of aError::ReqwestError
is areqwest::Error
, which might also be caused by another deeper error, and so on.
Exercise 5.a: modify your Error
type to use thiserror
and remove your existing From
implementations.
Check that everything still works, including when you make errors happens by changing the urls or the file name.
Prettier messages
You might notice that the error messages printed by default by main
are quite ugly. This is due to the default Debug
implementation for the error types.
Fortunately we can do better by using the anyhow
crate. This crate adds a anyhow::Error
type which is compatible with all errors implementing the std::error::Error
trait.
Exercise 6.a: after adding the anyhow
crate to your Cargo.toml
(you should know how to do it by then), modify your main()
function so that it returns a Result<(), anyhow::Error>
.
Note that in anyhow
you will find the following definition
#![allow(unused)] fn main() { type Result<T> = std::result::Result<T, anyhow::Error>; }
(std::result::Result
being the "default" Result
type that we use)
It means that you can use anyhow::Result<T>
everywhere you would be using Result<T, anyhow::Error>
, as a shortcut.
Exercise 6.b: make your main()
function return a anyhow::Result<()>
.
Look at the produced errors. Aren't they beautiful, compared to what we had before?
Even prettier messages
There exists an even better crate derived from anyhow
, which is called eyre
. The idea of eyre
is the same as anyhow
: the error type is called eyre::Report
(instead of anyhow::Error
), and eyre::Result<T>
can be used instead of anyhow::Result<T>
.
In addition, reporters can be installed to display the eyre::Report
errors in a particular way. For example, the [color-eyre](https://crates.io/crates/color-eyre)
crate will display the messages with colors to facilitate error reading. All you have to do is call color_eyre::install()?;
at the beginning of your main()
function.
Exercise 7.a: add the eyre
and color-eyre
crates to your Cargo.toml
, initialize color_eyre
as explained above, and make your main()
function return a eyre::Result<()>
.
Now make some errors happen and see how beautifully they are rendered.
Some thoughts about error handling
In our case, we never use the fact that we can match on the various variant of our Error
type. We could have used Result<T, eyre::Report>
(or the equivalent eyre::Result<T>
) as the return type of all our functions that can fail.
However, this is not always the case: if you are developing a library, your users might want to get precise information about an error you give them back. In this case, it is important to use an Error
enumeration allowing the library user to match the error against the various cases.
But here, let's simplify our code:
Exercise 7.b: remove your Error
type, and make every function that can fail return a eyre::Result<T>
(where T
is the type of a successful result).
However, you still have the problem of BadHttpResult
which is not an existing error. Don't worry, you can return prematurely with a textual error by using the macro eyre::bail!
with works like format!()
or println!()
:
#![allow(unused)] fn main() { // Return an error immediately with the given formatted text eyre::bail!("This returns an eyre::Report with {}", 42); }
Adding context
You may have noticed that the error message describes what the error is but not what you were trying to do. For example, when a web page fails to be retrieved, the URL is not shown anywhere, or when the file cannot be created, the file name is not to be seen.
By adding use eyre::Context;
near the beginning of a file, you import an extension trait of the Result
type which allows some contextual information to be added.
Let us take the example of the write_to_disk()
function:
#![allow(unused)] fn main() { fn write_to_disk(path: &str, data: Vec<String>) -> eyre::Result<()> { let mut f = std::fs::File::create(path)?; for l in data { f.write_all(format!("{l}\n").as_bytes())?; } Ok(()) } }
Every intermediate Result
can be enriched with a context if it contains an error before the error is returned:
#![allow(unused)] fn main() { fn write_to_disk(path: &str, data: Vec<String>) -> eyre::Result<()> { let mut f = std::fs::File::create(path).context(format!("Unable to create {path}"))?; for l in data { f.write_all(format!("{l}\n").as_bytes()) .context(format!("Error while writing {path}"))?; } Ok(()) } }
In case of an error, the context will be displayed:
$ cargo run
Error:
0: Unable to create /trending.txt
1: Permission denied (os error 13)
Location:
src/main.rs:45
[…]
Exercise 7.c: add a context to all errors before returning them.
Check by using a bad URL or a bad file name that the context is correctly printed and contains all the useful information.
Congratulations, you are now a master in Rust error handing.
Lab 05: Complex data structures
In this lab session you will learn about:
- Implementing recursive data structures in Rust;
- Implementing standard traits for both your own and foreign data types;
- Implementing your own iterators, to refactor iteration logic;
- Making all of the above generic, with appropriate trait bounds.
Implementing a mutable linked list
We want to implement a singly linked list in Rust, capable of storing i32
values in its nodes.
As you certainly know, such a data structure is recursive, made of two types of nodes, with each non-empty node pointing to the next node until the final (empty) node is reached, like this:
Rust enums are the right tool for the job, so as a first approximation we can start from:
#![allow(unused)] fn main() { pub enum List { Nil, Cons(i32, List), } }
Exercise 1.a: Try the above definition to see if it works. If not, explain in your own words why it doesn't, building upon the knowledge of some of the traits we have learned about in today's lecture.
Exercise 1.b:
The BoxList
type definition so that it compiles.
Then, create by hand (using Nil
, Cons
, and Box
methods) sample list values:
- the empty list,
- a list containing one element,
- a list containing two elements,
- up to the list depicted in the figure above.
To help debugging the values you are building, implement the Debug
trait for List
, so that you can print your lists with the {:?}
format string.
Exercise 1.c:
Let's now use the linked list to build a proper abstract data type: a stack.
Create a struct Stack
that uses List
internally to store its elements but also has a separate size
field to keep track of the list size.
All stack manipulations must respect the invariant that size
always corresponds to the total number of non-Nil
cells in the list.
Add to your Stack
an impl
block that implements at least the following methods:
new
that creates a fresh empty stack,len
that returns the stack length,push
that adds ani32
on top of the stack,pop
that removes and returns the top element (if it exists).
Make sure push
and pop
modify the stack list in place.
(But feel free to implement an alternative purely functional version of the stack—returning fresh versions of the stack at each method call—as a separate exercise!)
Tip: due to memory ownership rules, implementing push
and pop
might be trickier than you would expect based on experience with other programming language.
If you get stuck, have a look at the std::mem::replace
stdlib function and ponder how it can help.
Standard traits
Implementing standard traits for your own types make them work more nicely (and more idiomatically) with the rest of the Rust ecosystem. So let's make the data types you have defined well-behaved ecosystem citizens!
Exercise 2.a:
Implement the Display
trait for your Stack
(and possibly also for List
, if it helps), so that stacks can be pretty-printed like this:
#![allow(unused)] fn main() { let mut s = Stack::new(); s.push(10); s.push(20); s.push(30); println!("stack content: {s}"); }
stack content: [ 30 20 10 ]
Exercise 2.b:
You lists and stacks, after all, are just glorified integer vectors.
It makes sense to ease conversions between them and Vec<i32>
and we have seen how the From
trait is the idiomatic Rust way for supporting type conversions.
Implement conversions in both directions, from Vec<i32>
to List
and from List
to Vec<i32>
.
Once you have done this, you can do the same for Stack
, by simply calling into the conversions you have already implemented for List
(with some little extra work to precompute the stack length).
At the end, something like the following should work for your types:
#![allow(unused)] fn main() { // Vec<i32> -> List let v = vec![1, 2, 3]; assert_eq!( List::from(v), Cons(1, Box::new(Cons(2, Box::new(Cons(3, Box::new(Nil)))))) ); // List -> Vec<i32> let l = Cons(1, Box::new(Cons(2, Box::new(Cons(3, Box::new(Nil)))))); assert_eq!( vec![1, 2, 3], Vec::<i32>::from(l) ); }
Note that this is the first time we implement a trait for a type in a different crate than our own, specifically we are implementing From<List>
for Vec<i32>
, which is a generic type defined in the standard library.
Doing so is permitted by Rust provided that you respect the orphan rule, i.e.: either the trait or the type for which you are implementing the trait must be defined in the same crate as the implementation.
Exercise 2.c:
Implement the FromStr
trait for your Stack
(and possibly List
), so that stacks can be parsed from string like this:
#![allow(unused)] fn main() { let mut s1 = Stack::new(); s1.push(10); s1.push(20); s1.push(30); let s2: Stack = "[ 30 20 10 ]".parse().unwrap(); assert_eq!(s1, s2); }
Refactoring with iterators
By this point, your code probably contains a number of places where you iterate over the elements of a list to, e.g., print its content, count its elements, fill an empty vector with them, etc. As fans of the DRY principle we cannot stand this situation, let's refactor our way out of this! Obviously, the situation calls for the introduction of an iterator over list elements.
Exercise 3.a:
Add to your List
data type a iter()
method that returns a type implementing the Iterator
trait.
When done, the following should work out of the box:
#![allow(unused)] fn main() { let l = Cons(1, Box::new(Cons(2, Box::new(Cons(3, Box::new(Nil)))))); let mut i = l.iter(); assert_eq!(i.next(), Some(1)); assert_eq!(i.next(), Some(2)); assert_eq!(i.next(), Some(3)); assert_eq!(i.next(), None); }
Exercise 3.b:
Make your List
implement IntoIterator
, so that your lists can work out of the box with for
loops, like this:
#![allow(unused)] fn main() { let l = Cons(1, Box::new(Cons(2, Box::new(Cons(3, Box::new(Nil)))))); for e in &l { println!("list element: {e}"); } }
Exercise 3.c:
Refactor previous code you have written for List
(and/or Stack
) to use iterators so that you can avoid repeating the iteration logic in multiple places.
Exercise 3.d: (Bonus) Try to make your iterator(s) work like those of stdlib collections, specifically:
#![allow(unused)] fn main() { for element in &collection { ... } // produce shared refs (&i32 in our case), do not consume collection for element in &mut collection { ... } // produce mutable refs (&mut i32), do not consume collection for element in collection { ... } // produce items (i32), consume the collection }
Going generic
Wait, your data structures are not specific to i32
values at all!
It is perfectly legitimate to have lists and stacks containing different types.
Let's make them generic data structures that can do so.
Exercise 4.a:
Generalize your implementation of List
so that it takes a type parameter T
and can hold value of as many different types as possible.
Tip: in the process you will face probably face a dilemma, either give up on some of the features and traits you have implemented in the past for i32
lists, or add trait bounds to restrict the list of types your lists can contain.
It's up to you to decide what to do, as long as you understand the implications.
Exercise 4.b:
Your lists are nice because they are uniform, each list can contain multiple values of the same type.
In some cases is desirable to create non-uniform lists, containing values of different types.
By using trait objects, create a new data structure that can contain values of different types, that all share a trait (e.g., Display
, so that you can easily show what your lists contain).
Try filling your non-uniform lists with values of different types and verify that you can still pretty-print them to stdout with println!("{}", ...)
.
Now try to add a .len()
method to your non-uniform lists. Can you?
How about a .min()
method, analogous to the iterator consumer we have seen in class?
PS No! You don't need to implemented linked lists by hand in Rust, you have doubly-linked lists already available in the stdlib. Also, for most practical purposes, vectors are much better anyway.
Lab 06 — Packaging, testing, and fuzzing
In this lab session you will:
- create packages and crates;
- test them in various ways (unit, integration, doc);
- track and maximize the code coverage of your tests;
- property test and fuzz Rust code to find unexpected bugs.
By the end of the session you will be familiar with the Rust tools and techniques used to perform "serious sanity checks" on production Rust code.
Packaging Rust code
Consider the following structure representing user accounts.
Each account has two String
fields: a public username
and a private password
.
#![allow(unused)] fn main() { pub struct Account { pub username: String, password: String, } }
Exercise 1.a:
implement the following associated functions to create an Account
from a pair of string( slice)s and parse one from an existing string:
use std::str::FromStr; impl Account { /// Create a new account with a given username and password. fn new(username: &str, password: &str) -> Self { todo!() } } impl FromStr for Account { /// Create a new account by parsing its data from a string in the form "USERNAME:PASSWORD". fn from_str(s: &str) -> Result<Self, Self::Err> { todo!() } } fn main() { let user1 = Account::new("zack", "cne3bfvkw5eerv30jy3990yz0w"); let user2 = "sam:bf6ddgabh6jghr89beyg3gn7e8".parse::<Account>().unwrap(); let user1_name = &user1.username; let user2_name = &user2.username; println!("let's welcome our new users: {user1_name} and {user2_name}"); // let user1_pwd = &user1.password; // error[E0616]: field `password` of struct `Account` is private }
A few points worth nothing:
- The
FromStr
trait is a variant of theFrom
trait you have already seen, specifically meant to parse data from strings. When implemented, it enables the&str::parse()
function, used here inmain()
. - To implement
FromStr
you will have to pick anErr
type. - The
todo!
macro is very useful for code prototyping; it will make your code panic at runtime, but pass type checking. have a look at its documentation for details.
When done make sure that the above main
function runs without panicking.
Exercise 1.b:
put your code into a new package called accountmgr
containing two crates: one library crate (implementing the type alias and the functions new_account
and parse_account
) and one binary crate containing the main
function.
Make sure your crates build and that the binary runs without panicking.
Now try to uncomment the last line of main
, to make sure the module system does not allow you to access private fields from outside the defining module.
Remember: you now have two completely separate crates: a library crate with a public API that does not export the password
field of the Account
structure; and a binary crate using it as client code via that API.
Exercise 1.c:
(optional)
in the next few sections you will test the functionalities of your accountmgr
crate, which for now is quite bare bone.
To make things more interesting you can extend it by adding the following functionalities, with the API you please:
- Add a separate constructor function that randomizes password creation instead of taking one as input. For this you can use the
rand
crate and implement various "strong password" strategies to produce high-entropy passwords. - Add a function that returns the hashed version of a password account. You can use the
sha1
crate for this. - Add a batch constructor
fn from_file(filename: &Path) -> Vec<Self>
to read account data from file, one account per line. Here is a file with many accounts (~20k) that you can use for testing purposes.
Later on, when asked to test accountmgr
in the remainder of this lab session, also add tests for these additional functionalities!
If you decide not to do this exercise right now, but finish early with the rest of the lab assignments, come back to this one later, add missing functionalities above, add corresponding tests, etc., until done.
Testing Rust code
Unit testing
Unit tests are shipped together with the implementation code of a crate. They have access to all code entities in the implementing module, no matter their visibility from other modules.
Exercise 2.a:
add a sub-module tests
to the library crate of your accountmgr
package.
Add to it basic unit tests to convince yourself that the functionalities you have implemented thus far are correct.
It is up to you what test cases you want to use, but follow the usual rule of thumbs of testing weird/edge cases as well as some easy ones.
Make sure that cargo test
pass.
Make sure that you also test the internal private state of your values (e.g., that a password field is properly set to the same value passed to the constructor), because unit tests are your only chance to do so.
Integration testing
Integration tests are shipped in separate files located under the top-level tests/
directory and are compiled to independent binary crates linked to the main library crate.
As such, they can only access the public API of your library crate and not its private entities.
Exercise 2.b:
move all the tests you can from the tests
sub-module to integration tests located under tests/
and make sure that cargo test
still pass.
It should be possible to turn most of them to integration tests, but not all of them.
Which tests couldn't you turn into integration tests and why?
Documentation tests
Documentation tests are integration tests that reside alongside with the implementation code, in docstrings.
Exercise 2.c:
Add documentation to your code in the form of docstrings.
At the very minimum, for each public entity in the module provide a succinct explanation of what it does.
Check the documentation of rustdoc on how to write documentation and the details about the docstring markup language.
Verify by running cargo doc
that documentation is properly generated under target/doc/
for all the crates related to your package.
Exercise 2.d:
Either add new or turn some of your existing integration tests into documentation tests integrated into your docstrings.
Try to come up with doctests that are actually useful as documentation, because that is one of the key points of doctests.
For example, have you documented what should happen when trying to parse: a (1) well-formed account string, (2) an empty account string, (3) a non-empty account string that lacks the colon separator?
If not, this is your chance to do so!
Make sure that cargo test
still pass.
Code coverage
In a previous lab session you have seen how to use code coverage in C/C++ to guide the addition of test cases, so that all your code is exercised during the execution of a test suite.
Code coverage tooling is available for Rust as well.
You are going to give a try to Tarpaulin a code coverage tool for Rust, that can trace exercised code during test suite execution via either ptrace on Linux/x86_64 (by default) or LLVM's SanitizerCoverage (passing the --engine llvm
flag, supported on platforms other than Linux/x86_64).
Exercise 2.e:
Read the documentation of tarpaulin crate and install it so that the cargo tarpaulin
command become available in your Rust installation.
When you run cargo tarpaulin
in a Rust package, it will run the same tests of cargo test
(in fact: Tarpaulin even strives to offer a fully-compatible CLI), but under code coverage instrumentation.
At the end of the execution Tarpaulin will emit a report about covered/non-covered lines for each module.
A coverage "diff" w.r.t. the previous run will also be emitted.
Note that currently Tarpaulin only offers line coverage; branch coverage is not implemented yet.
Exercise 2.f:
Run cargo tarpaulin
on your package and verify what's your current line coverage.
By default, only a succinct textual report is returned; try adding the --out html
flag to obtain a browsable version of your code, with covered/non-covered lines highlighted.
If your line coverage is not 100%, try to reach that percentage by adding new test cases that cover the missing lines. If it's already 100% go back to Exercise 1.c, add a missing functionality, verify it decreases line coverage, and then make it reach full coverage again.
Property testing
Recall from lecture that:
Property testing is a testing technique [...] where we:
- randomly select test inputs,
- assert general properties that are automatically verifiable on test input/output pairs,
- verify that properties hold on the obtained test input/output pairs.
You are going to use the proptest
crate to verify some properties about the accountmgr
crate.
Add proptest
as a development dependency to your crate with cargo add --dev proptest
; doing so will make your package only conditionally depend on protest
, specifically when compiling tests, examples, and benchmarks.
Verify within Cargo.toml
that the dependencies has indeed been added under the [dev-dependencies]
section.
Exercise 3.a:
Add a new unit test to those already present in your lib.rs
module, which randomizes the parse string so that only valid account strings are formed (e.g., you can start with a simple regex like r"[a-z]*:[a-z]*"
representing letters followed by a colon followed by letters, and then improve it to be more "nasty" by adding other special characters).
Note that the new test will need to be put within a proptest!
environment.
Add an assertion to verify that the account string is properly parsed (i.e., that parse_account
does not return an error).
Here are some useful references about proptest
: crate, documentation, book.
Let's now strengthen our parsing requirements, because accounts with empty usernames and/or passwords are no good, right?
Exercise 3.b:
Change your parse_account
function so that it refuses accounts strings containing empty username and/or passwords.
Run your unit tests again. proptest
tests should catch the fact that empty usernames/passwords are no longer correctly parsed and give you examples of account strings that fail.
Improve your proptest
regex so that only (new) valid account strings are produced.
Add specific (non-random) unit tests to handle the cases of empty usernames/passwords.
Make sure your line coverage is still 100%.
Fuzzing Rust code
And now for something completely different.
ELF is the file format used by most executable binaries on Unix systems, including Linux. For $reasons you want to implement a command-line tool to check whether a file is a valid ELF binary or not. When invoked on an existing file it should return 0 (and output a nice explanation message) if the file is a valid ELF binary, 1 otherwise (and output another nice explanation message).
Oh, look! There is a handy elf_rs
crate on crates.io, let's use that as plumbing for implementing your tool.
Exercise 4.a: To begin with:
- create a package
check_elf
containing a library crate, - add to it as a dependency version 0.1.3 of
elf_rs
(e.g., by running:cargo add elf_rs@0.1.3
) and verify inCargo.toml
that the dependency has been added correctly, - in
src/lib.ml
implement the following functions by filling in theirtodo!()
-s:
#![allow(unused)] fn main() { pub fn is_valid_elf(content: &[u8]) -> bool { todo!() } pub fn is_valid_elf_file<P: AsRef<Path>>(filename: P) -> bool { todo!() } }
Tips:
- To check if a file is valid ELF with
elf_rs
just built anElf
structure usingElf::from_bytes
on the bytes (Vec<u8>
) read from a file. Iff you obtain anOk(_)
result the file is valid ELF. - Use the former function to implement the latter.
In addition to integration tests, Rust packages can contain examples located under the top-level examples/
directory.
Each *.rs
file in there is a standalone binary crate, with a main()
function as its entry point.
Individual examples can be built and run using, respectively, cargo build --example=NAME
and cargo run --example=NAME
.
Examples are useful both as sample usage of a library crate and as standalone executables that do something useful.
Exercise 4.b:
Create an example file examples/check_elf.rs
by completing the following skeleton:
fn main() { let filename = std::env::args().nth(1).unwrap(); // get the first argument on the CLI or panic todo!(); }
where you can already find an example of how to read the first CLI argument.
Instead of todo!()
you should use your library crate functions to check if the specified executable is a valid ELF or not.
If it is output a message saying so and exit with code 0 (check out the std::process::exit
function from the stdlib for that); if it isn't print the opposite message and exit with code 1.
When done test your example program to make sure it behaves like this:
$ cargo run --example=check_elf /bin/ls
Finished dev [unoptimized + debuginfo] target(s) in 0.01s
Running `target/debug/examples/check_elf /bin/ls`
/bin/ls is a valid ELF file
$ echo $?
0
$ cargo run --example=check_elf ./Cargo.toml
Finished dev [unoptimized + debuginfo] target(s) in 0.01s
Running `target/debug/examples/check_elf ./Cargo.toml`
./Cargo.toml is NOT a valid ELF file
$ echo $?
1
Looks like your tool is working! Or is it? Let's use fuzzing to check our robust it is against malformed inputs.
First, you need to install the cargo fuzz
sub-command and initialize fuzzing in your check_elf
package:
$ rustup toolchain add nightly
$ cargo +nightly fuzz init
$ ls fuzz/fuzz_targets/
fuzz_target_1.rs
Note that you need to use the "nightly" Rust toolchain (and hence install it beforehand if you haven't yet), because cargo fuzz
is not fully stable yet.
The +nightly
flag passed to cargo
ensures that the subsequent command (fuzz init
in this case) are run using that toolchain.
Exercise 4.c:
Edit fuzz_target_1.rs
to fuzz the input of your is_valid_elf
function.
(It should be straightforward to do so, as the type of fuzzed payload and what that function expects are very close.)
Now run cargo +nightly fuzz run fuzz_target_1
to fuzz is_valid_elf
.
It will (most likely) identify a pretty serious issue.
What is it?
How do you explain an issue like that arising in a Rust program?
(Spoiler: you will learn a lot more about this in the next lecture.)
Exercise 4.d:
The problem you have encountered has been reported as a bug to the elf_rs
maintainers a while ago.
Since then, the issue has been fixed and the most recent version of elf_rs
at the time of writing (0.3.0) is no longer affected by this nasty problem.
Update the version of elf_rs
you are using in Cargo.toml
and try fuzzing again.
Is the problem still there?
(Phew!)