Introduction
This website contains teaching material for the laboratory sessions of the course Safe System Programming (in Rust) (SSP-RS), at Télécom Paris / Polytechnique Institute of Paris.
ⓒ 2022-2024 Samuel Tardieu and Stefano Zacchiroli
Lab 01 — Unsafe System Programming
In the first lab session of the course, you will mainly work on:
- Setting up your development environment for the rest of the course, and
- Experimenting with unsafe system programming, leading to buffer overflows and other nasty stuff.
Let's get to it.
Development setup
For the duration of the entire lab sessions of this course you are going to use a number of tools, for the most part related to development in the C and Rust programming languages. As the first task for this lab session you are hence going to install the relevant toolchains and a comfortable (for you) development environment.
Operating system requirements: we are going to assume that your work machine runs Linux as operating system (OS). If that is not the case, you should fix this before proceeding. Any Linux-based distribution would do, but in the following we are going to rely on package names from Debian and Debian-based distributions (e.g., Ubuntu); you should translate them to your distro if need be.
Work environment
You have various options in terms of which "machine" to work on:
- work on your real (hardware) machine: laptop or desktop computer;
- work in a Docker container;
- work in a virtual machine.
Option 1: Work on your machine
We recommend this option as it is the most flexible, but it requires more manual setup on your side—setup which we are not going to detail here as it heavily depends on your OS choice.
We only list below the tools (and corresponding Debian package names) that you should install on your machine to be able to complete the exercises of this course.
Option 2: Work in a Docker container
If you would like to work in a Docker container, we provide a Dockerfile that you can use to create locally a Docker image that contains all the tools used for lab sessions. Download the Dockerfile locally and build a matching image like this:
$ docker build -t net7212 - < Dockerfile
(It will take a few minutes to complete, depending mostly on download and I/O speeds.)
Then, create a shell script named run.sh
to easily create and run a fresh container on that image every time you need it.
It can be something like the following:
#!/bin/bash
host_labdir="${HOME}/net7212-lab"
cont_labdir="/home/net7212/lab"
docker run -ti \
--net host \
-e DISPLAY="unix${DISPLAY}" \
-v /dev/shm:/dev/shm \
-v /etc/hosts:/etc/hosts \
-v /etc/localtime:/etc/localtime:ro \
-v /tmp/.X11-unix:/tmp/.X11-unix \
-v /var/run/dbus/system_bus_socket:/var/run/dbus/system_bus_socket \
-v $HOME/.Xauthority:/home/net7212/.Xauthority \
-v "${host_labdir}:${cont_labdir}" \
--rm \
--name net7212 \
net7212
note that host_labdir
should point to a directory on your real machine, that will be mounted within the container so that you can share files between the two worlds.
All your lab session material will need to be in this directory if you want to access it from within the container.
You can then enter and use a fresh container every time you need it like this:
./run.sh
net7212@machine:~$ rustc --version
rustc 1.72.0 (5680fa18f 2023-08-23)
and that if you create a file in your home directory under ~/net7212-lab
you can also access it from within the container, e.g.:
# outside the container
$ echo "// hello.c" > ~/net7212-lab/hello.c
./run.sh
net7212@machine:~$ cat ~/lab/hello.c
// hello.c
net7212@machine:~$
Option 3: Work in a virtual machine
If you prefer to fully control your work environment (like with option (1)), but not impact your day-to-day work machine, you have the option to install a virtual machine and configure it manually. Just install in there the same tools that you would have installed, following the instructions below, on your real machine. We are not going to detail this step further, as it heavily depends on your choice of virtualization technology.
C development toolchain
If you have chosen to work on your (virtual) machine, you should install the following tools:
Tool | Debian package name |
---|---|
bats | bats |
check | check |
clang-tidy | clang-tidy |
clang | clang |
gcc | gcc |
gdb | gdb |
ltrace | ltrace |
make | make |
strace | strace |
valgrind | valgrind |
If you have chosen the Docker option, they are all already installed in the container.
Exercise 0.a: Install all the above dependencies in the work environment of your choice.
When ready, make sure you can compile and run the following Hello World program with both gcc
and clang
:
#include <stdio.h>
int main(void) {
printf("Hello, World!\n");
return 0;
} // end of hello.c
like this:
$ gcc -Wall hello.c
$ ./a.out
Hello, World!
$ clang -Wall hello.c
$ ./a.out
Hello, World!
Now make sure that all of the following commands execute properly without failing on your setup (the exact output might very depending on version, configuration, etc.; that's OK):
$ bats -v
Bats 1.8.2
$ clang-tidy --version
Debian LLVM version 14.0.6
Optimized build.
Default target: x86_64-pc-linux-gnu
Host CPU: skylake
$ gdb --version
GNU gdb (Debian 12.1-4) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
$ ltrace --version
ltrace version 0.7.3.
Copyright (C) 1997-2009 Juan Cespedes <cespedes@debian.org>.
This is free software; see the GNU General Public Licence
version 2 or later for copying conditions. There is NO warranty.
$ make --version
GNU Make 4.3
Built for x86_64-pc-linux-gnu
Copyright (C) 1988-2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
$ strace --version
strace -- version 5.10
Copyright (c) 1991-2020 The strace developers <https://strace.io>.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Optional features enabled: stack-trace=libunwind stack-demangle m32-mpers mx32-mpers
$ valgrind --version
valgrind-3.19.0
Rust development toolchain
If you have chosen to work on your (virtual) machine, you should install the following tools:
Tool | Debian package |
---|---|
rustup | none (install by hand) |
If you have chosen the Docker option, they are all already installed in the container.
Exercise 0.b: Install all the above dependencies in the work environment of your choice.
Once installed, verify that you can successfully run the following basic tools of the Rust toolchain, like this (your output might very, depending on the installed versions):
$ rustup --version
rustup 1.26.0 (5af9b9484 2023-04-05)
info: This is the version for the rustup toolchain manager, not the rustc compiler.
info: The currently active `rustc` version is `rustc 1.72.0 (5680fa18f 2023-08-23)`
$ cargo --version
cargo 1.72.0 (103a7ff2e 2023-08-15)
$ rustc --version
rustc 1.72.0 (5680fa18f 2023-08-23)
Editor / IDE
You are also going to need a development editor or an IDE (Integrated Development Environment). As developers tend to be opinionated about this stuff we are not going to force you to use any specific one: pick your own!
We recommend either a flexible editor with IDE-like features (e.g., Emacs or Vim with LSP support for Rust) or a full-featured IDE like Visual Studio Code or NetBeans (do make sure it has support for Rust, though, as that will be needed later in the semester).
Exercise 0.c: Install the editor or IDE you want in the work environment of your choice.
When done, open the hello.c
file you used previously and make sure you can comfortably edit/compile/run it, ideally from within the editor/IDE itself.
Done?
Good! You are now all set with the developer setup!
Let's now move to the other matters of the day.
Memory unsafety
Out-of-bounds write
Exercise 1.a: write C programs (new, different from those seen in lecture 01) that exhibit weakness CWE 787: Out-of-bounds write. Your goal is to make your program segfault due to this CWE! Try out-of-bound writes in various memory segments, writing one program for each of the following cases:
- out-of-bound write in dynamically allocated memory on the heap (e.g., with
malloc
), - out-of-bound write in statically allocated memory (i.e., global variable, preinitialized or not),
- out-of-bound write in automatic memory (i.e., variables allocated on the stack).
Bonus point: make sure your programs do not emit any warnings when compiled invoking the compiler with the --Wall
flag (which asks to enable all compile-time warnings and that you should always use anyway!).
Try to determine experimentally what each program was doing just before the segfault (tip: try with gdb, nm, ltrace, strace).
What helped you determine this?
Exercise 1.b: consider the following local variable declarations at the beginning of a function (possibly main
):
char s1[16];
char s2[16];
write a program that does not segfault, but that with a single memory (or string) copy operation that writes to only one of these variables (s1
, s2
) modifies the content of both variables, with data of your choice.
Verify the result experimentally (e.g., with printf
or strcmp
).
Warning: do not forget about string \0
terminators!
Out-of-bounds read
Exercise 1.c: continuing from the idea from the previous exercise, write a program that exhibits at least one variant of CWE 125: Out-of-bounds read. The program should not segfault, but by performing a memory ready operation on a given variable, it should be able to read the content of another variable, correctly and in full.
Bonus point: make your program read the content of an out-of-scope local variable, e.g., one declared in a calling function.
Smashing the stack
Segmentation fault
Considering the following program:
void function(int a, int b, int c) {
char buffer1[5];
char buffer2[10];
}
int main() {
function(1, 2, 3);
return 0;
}
Exercise 2.a: compile the program to assembly language like this:
$ gcc -S -o example1.s example1.c
and identify in the generated assembly code where function
is called. How are arguments passed to the function? (The answer might vary depending on the compiler and architecture, but the assembly code will tell you!)
Exercise 2.b: compile, execute, and debug the following program:
#include <string.h>
void function(char *str) {
char buffer[16];
strcpy(buffer, str);
}
int main() {
char large_string[256];
int i;
for(i = 0; i < 255; i++) {
large_string[i] = 'A';
}
function(large_string);
return 0;
}
Explain what's happening and why.
Subverting the control flow
Historically (on 32 bit architectures and with less built-in defenses in C compilers) subverting the control flow via stack overflows was pretty easy. You can get an idea of how easy it was by reading the famous tutorial article Smashing The Stack For Fun And Profit by Aleph One (1996).
Nowadays it has become a little bit more complicated, but still very doable (and regularly done!, as we have seen in the course examples).
Exercise 2.c: go through the modern incarnation of Aleph One's tutorial, namely: Smashing the Stack in the 21st Century by Jon Gjengset (archived copy), reproducing the reported experiences on your machine. (Next week we will go through some of the defenses that you will have to manually disable to exploit a stack overflow, and discuss what they are good for and what they are not.)
Lab 02 — Hardening System Programming
In this lab session we will take a practical tour of existing techniques to secure system programs implemented in traditional system languages. We will experiment with: testing, fuzzing, code instrumentation (of both binary and source code), and static analysis.
By the end of the session you will have a better understanding of the state-of-the-art of hardening system programming, including its limitations.
Credits
The exercises of this lab session contain material and ideas reused with permission from week 1 assignments of Stanford's course CS 110L (2022) by Ryan Eberhardt, Armin Namavari, Will Crichton, Julio Ballista, and Thea Rossman.
Unit testing
Setup
Consider the initial C implementation of a simple currency type (a pair of a currency name and a currency amount) given in money.h and money.c.
We want to reassure ourselves that the code thus far is correct, using unit testing.
In check_money_tiny.c you can find the skeleton of a C program that uses money.c
as a library and unit tests it with a single test case.
It uses the Check unit testing framework for C, which you should have already installed as part of the first lab session.
It can be built and run like this:
$ gcc -Wall -c -o money.o money.c
$ gcc -Wall -c -o check_money_tiny.o check_money_tiny.c
$ gcc -Wall -o check_money_tiny check_money_tiny.o money.o `pkg-config --libs check`
$ ./check_money_tiny
Running suite(s): Money
100%: Checks: 1, Failures: 0, Errors: 0
check_money_tiny.c:17:P:Core:test_money_create_amount:0: Passed
Exercise 1.a: read through the implementation of money.c
and check_money_tiny.c
to make sure you understand what the code does.
Help yourself with the Check documentation as needed.
Code coverage
The notion of code coverage is a measure of how much of your implementation code is exercised during the execution of your test suite. It can be measured in various ways, we will focus on the simplest metric: how many lines of code are (not) executed during test suite running (i.e., line coverage). You can verify the line coverage of your current test suite using gcov, which is part of gcc by:
- compile/link your project passing the
-fprofile-arcs -ftest-coverage
options to compiler and linker, - execute your test suite (
./check_money_tiny
in this case), - inspect code coverage of a source code file you care about, e.g.:
$ gcov money.c
File 'money.c'
Lines executed:75.00% of 16
Creating 'money.c.gcov'
Lines executed:75.00% of 16
The summary tells you the percentage of lines of code exercised during execution.
The mentioned file money.c.gcov
is an annotated version of money.c
that shows precisely which lines have been executed (or not).
Exercise 1.b: measure the code coverage of the current test suite for money.c
.
Then, add all the test cases you need to maximize code coverage.
Can you reach 100% line coverage?
Why or why not?
Exercise 1.c: (optional and tricky, feel free to postpone this one to the end of the session) add a memory unsafety issue to the money.c
implementation, and specifically one that crashes the execution of your test case.
For example, you can implement a new money_add
method that performs an out-of-bound read or write.
Then add a test that run the buggy code, crashing the execution of the test case.
Does the test case crash cause the crash of the entire test suite? (i.e., do other test cases supposed to run after it keep running or not?)
Why or why not?
Stack overflow detection
Setup
Retrieve the program uppercase.c, compile it (recommended compiler options for this session are: -g -O0 -Wall -Wextra
), and run it, e.g., like this:
$ gcc -g -O0 -Wall -Wextra -o uppercase uppercase.c
$ ./uppercase 'Hello, World!'
HELLO, WORLD!
Now take some time to study the code of the program.
Exercise 2.a: the program is affected by a memory-related bug (EEEK!). Explain where the bug is, what its consequences might be, and how to fix it.
Static analysis with Clang-Tidy
Clang-Tidy (which you have installed already, as part of last week's lab session) performs some simple static analysis checks, ranging from basic linting to more advanced techniques. From the tool documentation:
clang-tidy is a clang-based C++ “linter” tool. Its purpose is to provide an extensible framework for diagnosing and fixing typical programming errors, like style violations, interface misuse, or bugs that can be deduced via static analysis.
Exercise 2.b: go (briefly) through the documentation of clang-tidy
and learn how to use it to analyze C source code files.
Then use clang-tidy
on uppercase.c
to check if it detects the bug you have identified in the previous exercise.
Does clang-tidy
catch the problem?
Based on what you have learned in last lecture, explain why clang-tidy
is capable to identify the problem or why it isn't.
Dynamic analysis with Valgrind
Clang-Tidy performs static analysis, Valgrind on the other hand performs dynamic analysis, based on dynamic binary re-compilation, as we have seen in last lecture. Let's see how Valgrind fares in detecting the bug with this program. After compiling your program, we can give it a try like this:
$ valgrind [--tool=memcheck] ./uppercase 'Hello, World!'
(Note: --tool=memcheck
is optional because Memcheck is the default tool that Valgrind uses unless otherwise requested.)
Exercise 2.c: does Valgrind detect the issue? Based on what you have learned in last lecture, explain why or why not.
Exercise 2.d: benchmark the time required to execute the program with and without Valgrind.
(Tip: as a simple way to do that you can use the standard UNIX time
command.
If you want to be fancier, this is your chance to learn about the amazing benchmarking tool Hyperfine and use it for a more scientifically accurate comparison.)
Dynamic analysis with LLVM sanitizers
As we have seen in last lecture, dynamic analysis can also be performed via source code instrumentation, which happens at compile time to add runtime checks. As we have seen LLVM sanitizers (look for the various entries with "Sanitizer" in their name in the Clang documentation) are a set of plugins for the Clang compiler that do that.
Exercise 2.e: change the compiler you are using to build from gcc
to clang
and first rebuild the program with it. (You can pass the same options suggested above, as they are compatible between the two compilers.)
Then, add all of the following sanitizers (check the documentation for how to do that) and rebuild the program: AddressSanitizer, LeakSanitizer, UndefinedBehaviorSanitizer.
Compare the size of the executables built with and without sanitizers. How different are they and why?
Exercise 2.f: run the instrumented program. Do sanitizers detect the issue? Based on what you have learned in last lecture, explain why or why not.
Let's streamline our build process a bit, as it is becoming a bit of a mess.
Retrieve the generic Makefile we make available for building the programs associated to this lab session.
If you are not familiar with Makefiles, go (briefly) through the GNU make documentation to make sure you know the basics.
In particular, read through the introductory material and make sure you understand the parts of Makefile syntax used in the given Makefile
, in particular: rules, variables, and static pattern rules.
(If you know about Makefiles already, feel free to skip this documentation study part.)
Exercise 2.g: change the retrieved Makefile so that:
- it only builds the
uppercase.c
programs (other programs mentioned in there will arrive later in this lab session!), and - passes the sanitizer flags to the compiler when compiling.
Recheck the result of the previous exercise about sanitizers to make sure they are there when building with make
.
Memory leak detection
Linked lists
Now retrieve the program linkedlist.c, add it to the Makefile, and compile it as before.
Take some time to study the program and understand what it does.
Exercise 3.a: this program is affected by (at least) two memory-related bugs this time, but of a different nature than before. Identify both of them, explain what the consequences might be, and how to fix them.
Exercise 3.b: try linting/static analysis with Clang-Tidy on this program, as you did before.
Does it detect the (right) issues? (Tip: beware of false positives!)
Exercise 3.c: try source code instrumentation with LLVM sanitizers on this program, as you did before.
Do they detect the issues?
In both cases feel free to speculate about why the used approaches did or did not identify the memory issues affecting this program.
String parsing
You know the drill!
Next program is bracket-parser.c, that extracts a "substring [within brackets]"
from a larger string.
Exercise 4.a: read the program, find the memory problem with it, try Clang-Tidy and LLVM sanitizers on it, explaining what you are seeing.
Exercise 4.b: on most inputs the LLVM sanitizers will not detect a memory issue.
Based on your understanding of the bug and your reading of the code (= white-box inspection), can you craft a specific input for bracket-parser
that will make the LLVM sanitizer spot the bug at runtime?
(And while we are at it: what does this tell you about the reliability of dynamic analysis to detect memory issues?)
Fuzzing with LibFuzzer
Let's now see if fuzzing can help us finding automatically both the memory issue affecting bracket-parser
and a test input that proves the issue is there.
Read the introductory documentation of LibFuzzer.
Make sure to understand how to add fuzz targets to your code, what the required API is, and the fact you should not have a main
function when using LibFuzzer (because it adds its own main
).
Exercise 4.c: add a fuzz target to your code, that allows you to fuzz the input of the parse
function of bracket-parser.c
.
Tip: you can take inspiration from the example seen in lecture, but remember that in this case you should pass well-formed C strings (trailing \0
!) to avoid triggering bugs other than the one you are looking for.
Exercise 4.d: build the fuzzer (adding fuzzer
to the list of -fsanitize=...
sanitizers) and run it.
Does it find the issue?
Does it produce a test case with a sample input that proves the issue is there?
Explain why.
(Pretty cool, huh?)
Lab 03: Rust and references
In this lab session you will build and run your first Rust program. You will manipulate references and objects lifetimes.
By the end of the session you will have a better understanding of the lifecycle of Rust objects.
Your first Rust program
In this section, you will learn how to create and compile your first Rust program using tools such as cargo
.
Anatomy of a Rust program
A basic Rust program is organized as follows:
Cargo.toml
Cargo.lock
src
└── main.rs
target
└── debug
└── myprogram
- The file
Cargo.toml
describes your program (name, version, authors, …). - The file
Cargo.lock
contains information about the version of the other packages used when compiling your program. src
contains the source of your program —main.rs
is the file containing themain
function.target
is a directory created by the compiler; it will contain the output of the compilation, including the executable produced from your source files, indebug/myprogram
(ifmyprogram
is the name of your program as described inCargo.toml
). Thetarget
directory should never be checked in your version control software. It should be ignored completely (e.g., by adding it to your.gitignore
if you are usinggit
).
Creating your first Rust program: intro
Exercise 0.a: Create the skeleton of your first Rust program by using the command cargo new
:
$ cargo new intro
Created binary (application) `intro` package
This will create a intro
directory with a complete skeleton of a Rust program:
$ tree intro
intro
├── Cargo.toml
└── src
└── main.rs
(you might not have the tree
on your computer – never mind, we will not need it)
Change your current directory to intro
and display the content of main.rs
:
$ cd intro
$ cat src/main.rs
fn main() {
println!("Hello, world!");
}
As you can see, the program is pretty basic. You can compile it by using cargo build
:
$ cargo build
Compiling intro v0.1.0 (/tmp/intro)
Finished dev [unoptimized + debuginfo] target(s) in 0.54s
Your program has compiled without errors. You can execute it by using cargo run
:
$ cargo run
Finished dev [unoptimized + debuginfo] target(s) in 0.00s
Running `target/debug/intro`
Hello, world!
Note that you can execute cargo run
without first executing cargo build
: the application will be built automatically.
Editing the Rust program
Exercise 0.b: Using your editor of choice (for example Visual Studio code or NetBeans), edit main.rs
so that the main program now displays a count from 1 to 5 inclusive:
fn main() { for i in 1..=5 { println!("i = {i}"); } }
Ensure that your editor is Rust-aware, or install the necessary extensions if needed. For Visual Studio Code, we suggest the installation of the "rust-analyzer" extension.
You can now execute your code. If everything went well, you should see the following output:
$ cargo run
Compiling intro v0.1.0 (/tmp/intro)
Finished dev [unoptimized + debuginfo] target(s) in 0.20s
Running `target/debug/intro`
i = 1
i = 2
i = 3
i = 4
i = 5
You can also compile your program and run the executable directly. As cargo run
output indicates it is located in target/debug/intro
:
$ cargo build
Finished dev [unoptimized + debuginfo] target(s) in 0.00s
$ ls -l target/debug/intro
-rwxr-xr-x 2 sam sam 4203904 Jan 16 15:02 target/debug/intro
$ target/debug/intro
i = 1
i = 2
i = 3
i = 4
i = 5
Using a Rust tutorial
There exists an interactive tutorial at https://rust-book.cs.brown.edu/. While reading this tutorial, you will be able to take quizzes and to let annotations that can be re-read later on.
You will not have to read the full tutorial at this time. We suggest that you ensure that you understand the concepts (and pass the quizzes) in the following chapters:
- 3.1 Variables and Mutability
- 3.3 Functions
- 4.1 What is Ownership?
- 4.2 References and Borrowing
- 4.3 The Slice Type
Of course, you are invited to read more chapters and take more quizzes as the lecturers describe new Rust concepts.
If you want to do more
Another way to familiarize yourself with Rust programming is to use the rustlings repository and program. Start by installing the rustings
program and clone the rustlings
repository containing the exercises:
$ cargo install rustlings
$ git clone https://github.com/rust-lang/rustlings/
$ cd rustlings
$ rustlings watch
The source for this exercise is in `exercises/intro/intro1.rs`. Have a look!
Going forward, the source of the exercises will always be in the success/failure output.
====================
You can keep working on this exercise,
or jump into the next one by removing the `I AM NOT DONE` comment
[…waits forever…]
The latest command rustlings watch
will watch your progress while editing the exercises in the exercises
directory of the repository.
For example, following the instructions given by rustlings watch
, edit the file exercises/intro/intro1.rs
with your favourite editor. As an exception, this exercise already compiles succesfully. All you have to do is remove the line // I AM NOT DONE
so that rustlings watch
will go to the next exercise in the series, exercises/intro/intro2.rs
.
This one will not compile. You have to edit the code so that it compiles successfully, then when you see that it does you can remove the // I AM NOT DONE
line, and so on.
You can stop doing the exercises whenever you want (just stop the rustlings watch
program). When you run rustlings watch
again it will start from where you left off.
References
Mutable references
The function std::mem::swap()
takes two mutable references on the same type as parameters and swap them.
Swapping values
Exercise 1.a: Complete the following function (and call it from main()
to see its effect) so that it has the expected effect:
#![allow(unused)] fn main() { fn do_swap() { let mut a = String::from("apple"); let mut b = String::from("bread"); println!("Before swap: a = {a}, b = {b}"); todo!(); // REPLACE ME with something using std::mem::swap() println!("After swap: a = {a}, b = {b}"); } }
The output should look like:
Before swap: a = apple, b = bread
After swap: a = bread, b = apple
You may realize that swapping two String
values do not move the various characters composing the strings in the heap. Only the three fields making up a String
(capacity, pointer, and length) are swapped.
Smallest first
Exercise 1.b: Write a function "smallest_first" with the following prototype which ensures that its first argument is smaller than its second argument and swap them otherwise:
fn smallest_first(a: &mut i32, b: &mut i32) { todo!(); // REPLACE ME } fn main() { let mut a = 42; let mut b = 10; println!("Before calling smallest_first: a = {a}, b = {b}"); smallest_first(&mut a, &mut b); println!("After calling smallest_first: a = {a}, b = {b}"); smallest_first(&mut a, &mut b); println!("After calling smallest_first again: a = {a}, b = {b}"); }
should display
Before calling smallest first: a = 42, b = 10
After calling smallest first: a = 10, b = 42
After calling smallest first again: a = 10, b = 42
Lifetimes
Sorting an immutable collection of strings
Let us assume that we want to write a function which sorts an immutable collection of String
elements. Here, "immutable" means that we cannot modify either the strings or the collection itself. We will have to return a Vec
containing the result.
Exercise 2.a: write a function sorted
which takes a reference to a slice of String
instances and return a Vec<String>
with the sorted strings.
The function will have the following signature:
#![allow(unused)] fn main() { fn sorted(names: &[String]) -> Vec<String> { todo!() // Write the code here } }
You can use the following constructs to help you to so:
vec![]
creates an emptyVec
(the type of the elements will be deduced from the context).- If
s
is aString
(or a&String
),s.clone()
will produce a newString
with the same content ass
. - You can iterate over the content of a slice
names
of type&[String]
withfor name in names { … }
.name
will have type&String
within the loop. - You can use
.sort()
on a mutable slice: it will sort your elements in the natural order.
You can test your function with the following main
program:
fn main() { let people = [String::from("John Doe"), String::from("Jane Eyre"), String::from("Jane Doe")]; let in_order = sorted(&people); for n in in_order { println!("- {n}"); } }
This should print "Jane Doe", "Jane Eyre", and "John Doe" in order.
Avoiding cloning the strings
While the previous program works, you had to clone the immutable strings. This is unfortunate as this consumes more memory. Why not work directly with a Vec
of &String
and sort it?
Exercise 2.b: modify your sorted
function so that it returns a Vec<&String>
instead of a Vec<String>
.
Note that .sort()
will work on a collection of &String
: it will sort them the same way it sorted String
elements.
Testing the guarantees of lifetimes
The signature of your sorted
function is:
#![allow(unused)] fn main() { fn sorted(names: &[String]) -> Vec<&String> }
Since you have not indicated any explicit lifetime (remember, lifetimes start with a single tick character '
), lifetimes elision kicked in: the lifetime of the &[String]
slice was copied over the &String
elements inside the Vec
.
It means that the result of sorted()
cannot live longer than its argument names
.
Exercise 2.c: experiment with the following modified main program, which will not compile, and make sure you understand why the compiler is telling you that you cannot store the output of sorted(&people)
into in_order
. Note how it prevents you from referencing strings that have been destroyed when people
went out of scope. You would not have had any such guarantee in C.
fn main() { let in_order = { let people = [String::from("John Doe"), String::from("Jane Eyre"), String::from("Jane Doe")]; sorted(&people) }; for n in in_order { println!("- {n}"); } }
Lifetimes: a complex case
Warning: in this page, we will touch several Rust areas that have not yet been explained in class. Do not hesitate to read the Rust documentation or to take things as granted for now.
Exercise 3.a: follow the code described below, implement it, make sure you understand how it works.
Problem statement
Sometimes, we would like to manipulate a string, for example to make it lowercase. However, the string we manipulate might already be in lowercase: in this situation, we would like to avoid copying the string and return a reference to the string we got as a parameter.
In short, we will need a type which can hold:
- either a proper
String
if we had to build it; - or a reference to another string (as a
&str
) if we could just reuse an existing one.
Building such a type
We will build a type, named StringOrRef
, which can either hold an owned String
or store a borrowed &str
reference.
Enumerated types
Rust has support for enumerated types with values, which means that we can write something like (incorrect for now):
#![allow(unused)] fn main() { pub enum StringOrRef { Owned(String), Borrowed(&str), } }
A StringOrRef
object will be either a StringOrRef::Owned
or a StringOrRef::Borrowed
variant. When such an object is destroyed, it will destroy its content:
- If it is a
StringOrRef::Owned
, it will destroy theString
it owned (and thus free the heap memory associated with theString
). - If it is a
StringOrRef::Borrowed
, it will destroy the&str
it owned, which does nothing since destroying a reference does not destroy the object it points to (as the reference does not own it).
Fixing the type
Our type will not compile because we haven't told the compiler how long the reference in StringOrRef::Borrowed
is supposed to be valid. Since we do not know that ourselves in advance, we have to make it a generic parameter of the type:
#![allow(unused)] fn main() { pub enum StringOrRef<'a> { Owned(String), Borrowed(&'a str), } }
We say that the type StringOrRef
is parameterized by the generic lifetime parameter 'a
.
We can create an owned value by using StringOrRef::Owned(String::from("Hello"))
, or make a reference to a string s
with StringOrRef::Borrowed(&s)
. In the former case, the lifetime 'a
can be anything since the StringOrRef
object owns the string. In the later case, 'a
will be set to the lifetime of the referenced string s
:
Important note: when a type is parameterized by a generic lifetime parameter, an object of this type can never live longer than this lifetime. For example, if s
has a lifetime 's
, StringOrRef::Borrowed(s)
is an object that cannot live longer than 's
. This is intuitively sound: since we store a reference to s
(wich has lifetime 's
) inside our StringOrRef
, the StringOrRef
cannot survive the disappearance of s
as that would leave us with a dangling pointer.
Exploring the type
Our type can be used through pattern-matching:
#![allow(unused)] fn main() { fn display(x: &StringOrRef<'_>) { // '_ means that the lifetime has no importance here match x { StringOrRef::Owned(s) => println!("owned string: {s}"), StringOrRef::Borrowed(s) => println!("borrowed string: {s}"), } } }
We can also write a function which returns a &str
from our object:
#![allow(unused)] fn main() { pub fn as_str<'a>(x: &'a StringOrRef<'_>) -> &'a str { match x { StringOrRef::Owned(s) => &s, StringOrRef::Borrowed(s) => s, } } }
Note how we didn't have to give the lifetime of the StringOrRef
generic parameter 'a
and used '_
which means "any lifetime": since the StringOrRef
reference has a lifetime of 'a
which is necessarily shorter or equal than the generic lifetime parameter (see the "Important note" above), we now that the returned reference is shorter than the one used as a generic parameter.
Implementing as_str()
as a method
Rather than using a standalone function, we can implement as_str()
as a method on StringOrRef
objects. Methods are implemented in a impl
block. In an impl
block, Self
designates the type itself. In methods parameters, self
in first position designates receiving the current object (it is a shortcut for self: Self
), &self
is a shortcut for self: &Self
and &mut self
is a shortcut for self: &mut Self
.
Let us rewrite as_str()
as a method:
#![allow(unused)] fn main() { impl StringOrRef<'_> { pub fn as_str(&self) -> &str { match self { StringOrRef::Owned(s) => &s, StringOrRef::Borrowed(s) => s, } } } }
You can note some interesting points about lifetimes:
- We used
<'_>
in ourimpl
block: our method defined in this block works with any generic lifetime parameter, as explained below. - We didn't explicitely write in the
as_str()
signature that the returned&str
has the same lifetime as&self
. This mecanism is called "lifetime elision": when a method has a&self
parameter, by default all outputs lifetime which are not explicit will have the same lifetime as&self
. This is a shortcut forpub fn as_str<'a>(&'a self) -> &'a str
.
Using or StringOrRef
type
We can now use our StringOrRef
type. For example, let us write a function which returns a lowercase version of a string, but allocates memory on the heap only when the string is not lowercase already:
#![allow(unused)] fn main() { // The lifetime of s will be copied into the generic lifetime parameter of StringOrRef. // Again, this is because of elision rules: if there is only one lifetime parameter in // the input, it will be copied into all non-explicit lifetime parameters in the output. pub fn to_lowercase(s: &str) -> StringOrRef<'_> { if s.chars().all(|c| c.is_lowercase()) { // All characters in the string are lowercase already, return a reference StringOrRef::Borrowed(s) } else { // We need to create a new String with a lowercase version StringOrRef::Owned(s.to_lowercase()) } } }
We can now use it in our main program and see that it works:
fn main() { let s1 = to_lowercase("HeLlO"); let s2 = to_lowercase("world"); println!("s1 = {}, s2 = {}", s1.as_str(), s2.as_str()); }
This will display "s1 = hello, s2 = world". Nothing indicates that "world" has not been copied. Let's enhance the program with the matches!
macro which can test if some expression matches a pattern, as in a match
expression:
fn variant(x: &StringOrRef<'_>) -> &'static str { if matches!(x, StringOrRef::Owned(_)) { "owned" } else { "borrowed" } } fn main() { let s1 = to_lowercase("HeLlO"); let s2 = to_lowercase("world"); println!("s1 = {}, s2 = {}", s1.as_str(), s2.as_str()); println!("s1 is {}, s2 is {}", variant(&s1), variant(&s2)); }
The output is now
s1 = hello, s2 = world
s1 is owned, s2 is borrowed
as expected. Neat eh?
Adding a destructor
When an StringOrRef
object is dropped (goes out of scope), it will get destroyed: the destructor for every field will be called (if any). For example, if it holds a StringOrRef::Owned
variant, the String
contained in this variant will be dropped and its destructor will be called, freeing memory on the heap.
We can visualize what happens by adding a destructor on StringOrRef
. It is done by implementing the Drop
trait:
#![allow(unused)] fn main() { impl Drop for StringOrRef<'_> { fn drop(&mut self) { print!( "Destroying the StringOrRef containing {} which is {}: ", self.as_str(), variant(self), ); if matches!(self, StringOrRef::Owned(_)) { println!("memory on the heap will be freed"); } else { // Dropping a reference doesn't free memory on the heap println!("no memory on the heap will be freed"); } } } }
If we execute our program, we will now read:
s1 = hello, s2 = world
s1 is owned, s2 is borrowed
Destroying the StringOrRef containing world which is borrowed: no memory on the heap will be freed
Destroying the StringOrRef containing hello which is owned: memory on the heap will be freed
s2
and s1
are destroyed in the reverse order of their creation when they go out of scope. No memory on the heap was ever allocated for string "world", which comes from a read-only memory area and has only been referenced. However, the string "hello" has been built into the heap while lowercasing the string "HeLlO" and needs to be freed: this
happens automatically when dropping s1
.
Conclusion
Types in Rust are powerful and allow easy memory management without needing a garbage collector. Doing the same thing in C would require extra fields (is the string owner or borrowed, code the deallocation by hand). We will later see even more powerful type manipulation in Rust.
Lab 04: Error handling
In this lab session you will:
- manipulate APIs that can return errors, such as network or disk ones;
- use external crates (i.e. Rust libraries);
- write regular expressions;
- determine which French people are trending in the news;
- display errors with colors.
By the end of the session you will have a better understanding of the way errors are typically handled in Rust.
Web page retrieval
In this part, you will have to retrieve a web page from a given URL and return its content as a String
or return a proper error if the page retrieval fails.
Initialization
But first of all, ensure that your Rust installation is up-to-date by typing:
$ rustup update
If you get an error, it means that you have not installed Rust through rustup
. In this case, make sure that your system is up-to-date.
For this lab, you will create a new Rust program named "lab4". Create a new binary (executable) project by typing
$ cargo new --bin lab4
$ cd lab4
Everything you will do today will happen in the lab4
directory, in particular in the src/main.rs
file.
Adding a dependency
You could make HTTP requests "by hand" by opening a TCP socket to the right host and port, and by sending HTTP commands and parsing the responses. However, you can take advantage of existing libraries written in Rust that can do that already.
The library ("crate" in Rust terminology) you will want to use is called reqwest (this is not a typo). In order to use it in your program, you have to add it to Cargo.toml
in the [dependencies]
section.
Rather than adding it by hand, you can use the cargo add
command. Moreover, we want to use the "blocking" API of reqwest which is the simplest one at this stage. This "blocking" API is enabled by requesting (ah ah) the "blocking" feature or reqwest:
$ cargo add reqwest --features blocking
If you look at your Cargo.toml
, you should see something like:
[dependencies]
reqwest = { version = "0.12.8", features = ["blocking"] }
It indicates that your Rust program will use version 0.12.8 of the "reqwest" crate with the "blocking" feature enabled. 0.12.8 is the latest published version of the crate at the time this page has been written.
Adding the "reqwest" crate as a dependency means that you will be able to use its types and functions by prefixing them with reqwest::
.
⚠️ If you later receive a compilation error, you might need to install the
pkg-config
andlibssl-dev
packages (or equivalent) on your system using your package manager, so thatreqwest
can find the library it needs (namely, OpenSSL for cryptographic routines required forhttps://
handling).
Fetching a web page
Your first function will retrieve a web page from its URL using the blocking API of reqwest, whose documentation is accessible online.
Exercise 1.a: write a function with signature fn get(url: &str) -> Result<String, reqwest::Error>
which returns the content of the web page located at url
(use a code similar to the one in the documention).
Use the following main()
program to test it:
fn main() -> Result<(), reqwest::Error> { println!("{}", get("https://rfc1149.net/")?); Ok(()) }
Note how main()
can return a Result<(), E>
instead of returning nothing (which is written ()
in Rust and is the equivalent of void
in C-like languages). Either get("https://rfc1149.net/")
returns an error and it will be propagated to main()
by the ?
operator, or it returns a String
which will be displayed. At the end of the main()
program, Ok(())
ensures that ()
is returned in an Ok
.
Returning a better error
Now, try changing the URL with one returning a 404 (not found) code. You can use "https://rfc1149.net/nonexistent"
, which does not exist.
Note how your get()
function returns without an error: it returns the content of the error page. This is not a good idea: in your program, you only want to return pages which were successfully found.
However, you will not be able to return a reqwest::Error
to indicate that the page was not found, as it is not an existing error condition for reqwest::Error
. You will have to write your own Error
type with two variants for now:
#![allow(unused)] fn main() { #[derive(Debug)] enum Error { Reqwest(reqwest::Error), BadHttpResult(u16), } }
This Error
type can be a Error::Reqwest
and encapsulate a reqwest::Error
, or it can be a Error::BadHttpResult
and encapsulate the HTTP error code returned by the web server (for example 404 for "not found").
The #[derive(Debug)]
will be explained in a later class. For the time being, you just need to know that it will allow an Error
object to be displayed by the main()
program if needed. Also, you can display an Error
yourself by using the {:?}
placeholder:
#![allow(unused)] fn main() { let my_error = Error::BadHttpResult(404); println!("The error is {my_error:?}."); }
would display
The error is Error::BadHttpResult(404).
Also, in order to take advantage of the automatic conversion performed by the ?
operator, you want to implement From<reqwest::Error>
for your Error
type:
#![allow(unused)] fn main() { impl From<reqwest::Error> for Error { fn from(e: reqwest::Error) -> Error { // Encapsulate the error into a Error::Reqwest variant Error::Reqwest(e) } } }
Exercise 1.b: update your get()
function so that it checks the status code of the response before reading its text. Your get()
function will return a Result<String, Error>
, and your main()
function will return a Result<(), Error>
, in order to accomodate both error conditions.
Look at the documentation for the reqwest::blocking::get()
method: what is its return type? What method can be called on a Response
to get a StatusCode
and compare it with StatusCode::Ok
? How can you get the numerical u16
code of a StatusCode
?
Note: instead of typing qualified type names such as reqwest::StatusCode
, you can add a use reqwest::StatusCode;
at the beginning of your program: this will import StatusCode
in your namespace, and you will be able to use StatusCode
instead of the longer reqwest::StatusCode
.
Check that your new version of get()
works by ensuring that an error is displayed when trying to print the content of "https://rfc1149.net/nonexistent"
. You should see a 404 error.
Name identification
Now that you are able to retrieve a web page, you will try to identify names written in it. A name looks like "John Doe": an uppercase letter followed by lowercase letters followed by a space followed by an uppercase letter followed by lowercase letters.
Rather than looking for such a pattern yourself, you might want to use the "regex" crate which implements regular expressions.
Using the "regex" crate
Add the "regex" crate to your Cargo.toml
:
$ cargo add regex
From now on, you are able to use the entities defined in this crate, in particular the regex::Regex
type. To be able to refer it as Regex
instead of regex::Regex
, you might want to put a use regex::Regex;
near the beginning of your program.
Of course we want our regular expression to match French names containing accented characters. You can use the following regular expression patterns to identify uppercase and lowercase letters:
\p{Uppercase}
will match any Unicode uppercase letter (for example it would match the Greek delta letter "Δ")\p{Lowercase}
will match any Unicode lowercase letter, such as the Greek delta letter "δ"+
after a pattern means "one or more"
You can then deduce that a name component can be represented as \p{Uppercase}\p{Lowercase}+
. This would match "John", "Doe", "François" or "Strauß". The whole regular expression can then be \p{Uppercase}\p{Lowercase}+ \p{Uppercase}\p{Lowercase}+
(two components separated by a space).
Creating the Regex
A regular expression is created using Regex::new()
, which returns a Result
. Since we know that our expression is valid, we can call unwrap()
on the result:
#![allow(unused)] fn main() { let re = Regex::new("\p{Uppercase}\p{Lowercase}+ \p{Uppercase}\p{Lowercase}+").unwrap(); }
However, it you do this, you will likely get an error: \p
in the string is interpreted as an escaped p
. As \n
represents a "line feed", \p
represents… nothing known. This is an invalid escape sequence in a string.
Fortunately, Rust has raw strings, in which no escape character is recognized. The syntax of a raw string is r"…"
.
Also, you can also put double quotes in a raw string if you need it, by using a pound sign to the double quote delimeters:
#![allow(unused)] fn main() { let s = r#"This is a string with some "quotes" in it"#; }
But what if you need to put a double quote followed by a pound sign ("#
) in the raw string? This is easy, you can change the start and end marker and increase the number of pound signs provided they match:
#![allow(unused)] fn main() { let s = r###"You need a quote + 3 pound signs to end the string"###; let t = r###"You can put "## inside without any problem!"###; }
Writing the extract_names()
function
Using the .find_iter()
method on a regex::Regex
object, you can iterate over the matches found in the input string as shown in the following example which displays all sequences of uppercase characters (with at least two of them) found in the file "text.txt":
fn main() { // Read the content of the "text.txt" file into variable s let s: String = std::fs::read_to_string("text.txt").unwrap(); // Match at least two consecutive uppercase character let re = regex::Regex::new(r#"\p{Uppercase}{2,}"#).unwrap(); println!("All uppercase sequences found:"); for m in re.find_iter(&s) { // Inside the loop m is a regex::Match, which as a .as_str() method println!(" - {}", m.as_str()); } }
Exercise 2.a: write a function with signature fn extract_names(s: &str) -> Vec<String>
which returns all plausible names in a string.
In this function, you will (those are mere suggestions, you are free to do it any other way you see fit):
- create a
re
variable containing aRegex
with the pattern seen above - create an empty vector (using
vec![]
) and store it into a mutable variable (this is the vector you will return at the end of the function) - iterate over
re.find_iter()
to find all the matches - on every match
m
you can call.as_str()
to get a&str
representing the text of the match (for example "John Doe"), that you can transform into a string using.to_owned()
orString::from()
(as usual) - push the
String
in theVec<String>
that you plan to return (.push()
is the method you want to use) - return the vector
Try your function with the following main()
function:
fn main() { let names = extract_names("Yesterday, John Doe met François Grüß in a tavern"); println!("{names:?}"); }
Since String
implements Debug
, Vec<String>
also implements Debug
and can be displayed using the {:?}
placeholder which is handy for debugging.
Deduplicating the names
Unfortunately, we may end up with duplicates. If a text contains the same name several times, it will be present several times in the output.
A vector is not really the best structure to represent a set of objects, like a set of names. We do not care about the order, only about the presence of a name.
Rust has a std::collections::HashSet<T>
type in its standard library (std
). Provided that you added a use std::collections::HashSet;
near the beginning of your program, you can:
- create a new
HashSet<T>
withHashSet::new()
- insert an element in a set
h
withh.insert(element)
; nothing happens if the element is already present in the set - iterate over the elements of a set
h
like you did with a vector:for element in h { … }
(ifh
is aHashSet<T>
,element
will be a&T
in the loop) - display a
HashSet<T>
using{:?}
as long as its element typeT
implementsDebug
Exercise 2.b: make your extract_names()
function return a HashSet<String>
(instead of a Vec<String>
).
Using the following main()
function, you will see that there can be no duplicates:
fn main() { let names = extract_names("John Doe, François Grüß, John Doe"); println!("{names:?}"); }
Returning the names in a page
Exercise 2.c: add a function with signature fn names(url: &str) -> Result<HashSet<String>, Error>
which returns all plausible names in a web page.
Check your function by displaying the names present on "https://www.liberation.fr"
.
Extra: if you have more time
If you have more time, you can fix the name detector such that it accepts multipart names, such as "Jean-Pierre Elkabbach".
Sets intersection
If you have two HashSet<T>
h1
and h2
, you can iterate over their intersection which is denoted by h1.intersection(&h2)
. This will iterate over all common elements using a reference of type &T
.
Exercise 3.a: write a function with signature fn trending() -> Result<Vec<String>, Error>
which returns a vector with the names present in both "https://www.liberation.fr"
and "https://www.mediapart.fr"
. The vector will be sorted before being returned.
Print the trending names from your main()
function:
fn main() -> Result<(), Error> { println!("Trending names:"); for name in trending()? { println!(" - {name}"); } Ok(()) }
File system
We want to store the list of trending people in a file on a disk.
Exercise 4.a: write a function with signature fn write_to_disk(path: &str, data: Vec<String>) -> Result<(), io::Error>
which writes every string in data
on its own line in file located at path
.
The method join("\n")
applied to a Vec<String>
will return a new String
with the \n
character (line feed) inserted between each line. Do not forget to add an extra '\n' character at the end of the returned string.
To write a string content
into a file file_name
, you can use std::fs::write(file_name, content)
which returns a Result<(), std::io::Error>
.
Exercise 4.b: modify your main()
function such that you write the trending names into a file named "trending.txt"
.
You will have to add a new variant to your Error
enum in order to encapsulate the std::io::Error
if there is a problem when writing a file.
Check that the error detection and reporting works fine by attempting to write in a file you cannot write to (such as "/trending.txt"
, you most likely don't have the rights to write at the root of your file system).
Easier errors
You may have noticed that it is rather cumbersome to write all those From
methods to convert existing errors into our own variant. Fortunately, a crate exists to make this job easier: thiserror.
After adding thiserror
to your Cargo.toml
$ cargo add thiserror
you will be able to define your Error
type as follows:
#![allow(unused)] fn main() { #[derive(Debug, thiserror::Error)] enum Error { #[error("Wrong HTTP status code: {0}")] BadHttpResult(u16), #[error(transparent)] Io(#[from] std::io::Error), #[error(transparent)] ReqwestError(#[from] reqwest::Error), } }
Our Error
type looks the same as before with a few additions:
- It derives
thiserror::Error
. This is the heart of thethiserror
crate and makes the other things below work. Behind the scenes, it uses "procedural macros" which is a way to transform some Rust code into another Rust code during the compilation. - The
#[error]
attribute indicates how to display the error.{0}
refers to the first field. The fields could also be named, a fieldxyz
would use{xyz}
in the#[error]
attribute. If you are encapsulating another error, you may usetransparent
to indicate that you simply want to display the inner error, and that you have nothing to add. - The
#[from]
attribute on the field of a variant indicates that aFrom
implementation should be generated from the type following#[from]
to anError
. - Your
Error
type will implement thestd::error::Error
trait which allows proper error chaining: the system will know (and will be able to display) that the cause of aError::ReqwestError
is areqwest::Error
, which might also be caused by another deeper error, and so on.
Exercise 5.a: modify your Error
type to use thiserror
and remove your existing From
implementations.
Check that everything still works, including when you make errors happens by changing the urls or the file name.
Prettier messages
You might notice that the error messages printed by default by main
are quite ugly. This is due to the default Debug
implementation for the error types.
Fortunately we can do better by using the anyhow
crate. This crate adds a anyhow::Error
type which is compatible with all errors implementing the std::error::Error
trait.
Exercise 6.a: after adding the anyhow
crate to your Cargo.toml
(you should know how to do it by then), modify your main()
function so that it returns a Result<(), anyhow::Error>
.
Note that in anyhow
you will find the following definition
#![allow(unused)] fn main() { type Result<T> = std::result::Result<T, anyhow::Error>; }
(std::result::Result
being the "default" Result
type that we use)
It means that you can use anyhow::Result<T>
everywhere you would be using Result<T, anyhow::Error>
, as a shortcut.
Exercise 6.b: make your main()
function return a anyhow::Result<()>
.
Look at the produced errors. Aren't they beautiful, compared to what we had before?
Even prettier messages
There exists an even better crate derived from anyhow
, which is called eyre
. The idea of eyre
is the same as anyhow
: the error type is called eyre::Report
(instead of anyhow::Error
), and eyre::Result<T>
can be used instead of anyhow::Result<T>
.
In addition, reporters can be installed to display the eyre::Report
errors in a particular way. For example, the [color-eyre](https://crates.io/crates/color-eyre)
crate will display the messages with colors to facilitate error reading. All you have to do is call color_eyre::install()?;
at the beginning of your main()
function.
Exercise 7.a: add the eyre
and color-eyre
crates to your Cargo.toml
, initialize color_eyre
as explained above, and make your main()
function return a eyre::Result<()>
.
Now make some errors happen and see how beautifully they are rendered.
Some thoughts about error handling
In our case, we never use the fact that we can match on the various variant of our Error
type. We could have used Result<T, eyre::Report>
(or the equivalent eyre::Result<T>
) as the return type of all our functions that can fail.
However, this is not always the case: if you are developing a library, your users might want to get precise information about an error you give them back. In this case, it is important to use an Error
enumeration allowing the library user to match the error against the various cases.
But here, let's simplify our code:
Exercise 7.b: remove your Error
type, and make every function that can fail return a eyre::Result<T>
(where T
is the type of a successful result).
However, you still have the problem of BadHttpResult
which is not an existing error. Don't worry, you can return prematurely with a textual error by using the macro eyre::bail!
with works like format!()
or println!()
:
#![allow(unused)] fn main() { // Return an error immediately with the given formatted text eyre::bail!("This returns an eyre::Report with {}", 42); }
Adding context
You may have noticed that the error message describes what the error is but not what you were trying to do. For example, when a web page fails to be retrieved, the URL is not shown anywhere, or when the file cannot be created, the file name is not to be seen.
By adding use eyre::Context;
near the beginning of a file, you import an extension trait of the Result
type which allows some contextual information to be added.
Let us take the example of the write_to_disk()
function:
#![allow(unused)] fn main() { fn write_to_disk(path: &str, data: Vec<String>) -> eyre::Result<()> { let mut f = std::fs::File::create(path)?; for l in data { f.write_all(format!("{l}\n").as_bytes())?; } Ok(()) } }
Every intermediate Result
can be enriched with a context if it contains an error before the error is returned:
#![allow(unused)] fn main() { fn write_to_disk(path: &str, data: Vec<String>) -> eyre::Result<()> { let mut f = std::fs::File::create(path).context(format!("Unable to create {path}"))?; for l in data { f.write_all(format!("{l}\n").as_bytes()) .context(format!("Error while writing {path}"))?; } Ok(()) } }
In case of an error, the context will be displayed:
$ cargo run
Error:
0: Unable to create /trending.txt
1: Permission denied (os error 13)
Location:
src/main.rs:45
[…]
Exercise 7.c: add a context to all errors before returning them.
Check by using a bad URL or a bad file name that the context is correctly printed and contains all the useful information.
Congratulations, you are now a master in Rust error handing.
Lab 05: Complex data structures
In this lab session you will learn about:
- Implementing recursive data structures in Rust;
- Implementing standard traits for both your own and foreign data types;
- Implementing your own iterators, to refactor iteration logic;
- Making all of the above generic, with appropriate trait bounds.
Implementing a mutable linked list
We want to implement a singly linked list in Rust, capable of storing i32
values in its nodes.
As you certainly know, such a data structure is recursive, made of two types of nodes, with each non-empty node pointing to the next node until the final (empty) node is reached, like this:
Rust enums are the right tool for the job, so as a first approximation we can start from:
#![allow(unused)] fn main() { pub enum List { Nil, Cons(i32, List), } }
Exercise 1.a: Try the above definition to see if it works. If not, explain in your own words why it doesn't, building upon the knowledge of some of the traits we have learned about in today's lecture.
Exercise 1.b:
The BoxList
type definition so that it compiles.
Then, create by hand (using Nil
, Cons
, and Box
methods) sample list values:
- the empty list,
- a list containing one element,
- a list containing two elements,
- up to the list depicted in the figure above.
To help debugging the values you are building, implement the Debug
trait for List
, so that you can print your lists with the {:?}
format string.
Exercise 1.c:
Let's now use the linked list to build a proper abstract data type: a stack.
Create a struct Stack
that uses List
internally to store its elements but also has a separate size
field to keep track of the list size.
All stack manipulations must respect the invariant that size
always corresponds to the total number of non-Nil
cells in the list.
Add to your Stack
an impl
block that implements at least the following methods:
new
that creates a fresh empty stack,len
that returns the stack length,push
that adds ani32
on top of the stack,pop
that removes and returns the top element (if it exists).
Make sure push
and pop
modify the stack list in place.
(But feel free to implement an alternative purely functional version of the stack—returning fresh versions of the stack at each method call—as a separate exercise!)
Tip: due to memory ownership rules, implementing push
and pop
might be trickier than you would expect based on experience with other programming language.
If you get stuck, have a look at the std::mem::replace
stdlib function and ponder how it can help.
Standard traits
Implementing standard traits for your own types make them work more nicely (and more idiomatically) with the rest of the Rust ecosystem. So let's make the data types you have defined well-behaved ecosystem citizens!
Exercise 2.a:
Implement the Display
trait for your Stack
(and possibly also for List
, if it helps), so that stacks can be pretty-printed like this:
#![allow(unused)] fn main() { let mut s = Stack::new(); s.push(10); s.push(20); s.push(30); println!("stack content: {s}"); }
stack content: [ 30 20 10 ]
Exercise 2.b:
You lists and stacks, after all, are just glorified integer vectors.
It makes sense to ease conversions between them and Vec<i32>
and we have seen how the From
trait is the idiomatic Rust way for supporting type conversions.
Implement conversions in both directions, from Vec<i32>
to List
and from List
to Vec<i32>
.
Once you have done this, you can do the same for Stack
, by simply calling into the conversions you have already implemented for List
(with some little extra work to precompute the stack length).
At the end, something like the following should work for your types:
#![allow(unused)] fn main() { // Vec<i32> -> List let v = vec![1, 2, 3]; assert_eq!( List::from(v), Cons(1, Box::new(Cons(2, Box::new(Cons(3, Box::new(Nil)))))) ); // List -> Vec<i32> let l = Cons(1, Box::new(Cons(2, Box::new(Cons(3, Box::new(Nil)))))); assert_eq!( vec![1, 2, 3], Vec::<i32>::from(l) ); }
Note that this is the first time we implement a trait for a type in a different crate than our own, specifically we are implementing From<List>
for Vec<i32>
, which is a generic type defined in the standard library.
Doing so is permitted by Rust provided that you respect the orphan rule, i.e.: either the trait or the type for which you are implementing the trait must be defined in the same crate as the implementation.
Exercise 2.c:
Implement the FromStr
trait for your Stack
(and possibly List
), so that stacks can be parsed from string like this:
#![allow(unused)] fn main() { let mut s1 = Stack::new(); s1.push(10); s1.push(20); s1.push(30); let s2: Stack = "[ 30 20 10 ]".parse().unwrap(); assert_eq!(s1, s2); }
Refactoring with iterators
By this point, your code probably contains a number of places where you iterate over the elements of a list to, e.g., print its content, count its elements, fill an empty vector with them, etc. As fans of the DRY principle we cannot stand this situation, let's refactor our way out of this! Obviously, the situation calls for the introduction of an iterator over list elements.
Exercise 3.a:
Add to your List
data type a iter()
method that returns a type implementing the Iterator
trait.
When done, the following should work out of the box:
#![allow(unused)] fn main() { let l = Cons(1, Box::new(Cons(2, Box::new(Cons(3, Box::new(Nil)))))); let mut i = l.iter(); assert_eq!(i.next(), Some(1)); assert_eq!(i.next(), Some(2)); assert_eq!(i.next(), Some(3)); assert_eq!(i.next(), None); }
Exercise 3.b:
Make your List
implement IntoIterator
, so that your lists can work out of the box with for
loops, like this:
#![allow(unused)] fn main() { let l = Cons(1, Box::new(Cons(2, Box::new(Cons(3, Box::new(Nil)))))); for e in &l { println!("list element: {e}"); } }
Exercise 3.c:
Refactor previous code you have written for List
(and/or Stack
) to use iterators so that you can avoid repeating the iteration logic in multiple places.
Exercise 3.d: (Bonus) Try to make your iterator(s) work like those of stdlib collections, specifically:
#![allow(unused)] fn main() { for element in &collection { ... } // produce shared refs (&i32 in our case), do not consume collection for element in &mut collection { ... } // produce mutable refs (&mut i32), do not consume collection for element in collection { ... } // produce items (i32), consume the collection }
Going generic
Wait, your data structures are not specific to i32
values at all!
It is perfectly legitimate to have lists and stacks containing different types.
Let's make them generic data structures that can do so.
Exercise 4.a:
Generalize your implementation of List
so that it takes a type parameter T
and can hold value of as many different types as possible.
Tip: in the process you will face probably face a dilemma, either give up on some of the features and traits you have implemented in the past for i32
lists, or add trait bounds to restrict the list of types your lists can contain.
It's up to you to decide what to do, as long as you understand the implications.
Exercise 4.b:
Your lists are nice because they are uniform, each list can contain multiple values of the same type.
In some cases is desirable to create non-uniform lists, containing values of different types.
By using trait objects, create a new data structure that can contain values of different types, that all share a trait (e.g., Display
, so that you can easily show what your lists contain).
Try filling your non-uniform lists with values of different types and verify that you can still pretty-print them to stdout with println!("{}", ...)
.
Now try to add a .len()
method to your non-uniform lists. Can you?
How about a .min()
method, analogous to the iterator consumer we have seen in class?
PS No! You don't need to implemented linked lists by hand in Rust, you have doubly-linked lists already available in the stdlib. Also, for most practical purposes, vectors are much better anyway.
Lab 06 — Packaging, testing, and fuzzing
In this lab session you will:
- create packages and crates;
- test them in various ways (unit, integration, doc);
- track and maximize the code coverage of your tests;
- property test and fuzz Rust code to find unexpected bugs.
By the end of the session you will be familiar with the Rust tools and techniques used to perform "serious sanity checks" on production Rust code.
Packaging Rust code
Consider the following structure representing user accounts.
Each account has two String
fields: a public username
and a private password
.
#![allow(unused)] fn main() { pub struct Account { pub username: String, password: String, } }
Exercise 1.a:
implement the following associated functions to create an Account
from a pair of string( slice)s and parse one from an existing string:
use std::str::FromStr; impl Account { /// Create a new account with a given username and password. fn new(username: &str, password: &str) -> Self { todo!() } } impl FromStr for Account { /// Create a new account by parsing its data from a string in the form "USERNAME:PASSWORD". fn from_str(s: &str) -> Result<Self, Self::Err> { todo!() } } fn main() { let user1 = Account::new("zack", "cne3bfvkw5eerv30jy3990yz0w"); let user2 = "sam:bf6ddgabh6jghr89beyg3gn7e8".parse::<Account>().unwrap(); let user1_name = &user1.username; let user2_name = &user2.username; println!("let's welcome our new users: {user1_name} and {user2_name}"); // let user1_pwd = &user1.password; // error[E0616]: field `password` of struct `Account` is private }
A few points worth nothing:
- The
FromStr
trait is a variant of theFrom
trait you have already seen, specifically meant to parse data from strings. When implemented, it enables the&str::parse()
function, used here inmain()
. - To implement
FromStr
you will have to pick anErr
type. - The
todo!
macro is very useful for code prototyping; it will make your code panic at runtime, but pass type checking. have a look at its documentation for details.
When done make sure that the above main
function runs without panicking.
Exercise 1.b:
put your code into a new package called accountmgr
containing two crates: one library crate (implementing the type alias and the functions new_account
and parse_account
) and one binary crate containing the main
function.
Make sure your crates build and that the binary runs without panicking.
Now try to uncomment the last line of main
, to make sure the module system does not allow you to access private fields from outside the defining module.
Remember: you now have two completely separate crates: a library crate with a public API that does not export the password
field of the Account
structure; and a binary crate using it as client code via that API.
Exercise 1.c:
(optional)
in the next few sections you will test the functionalities of your accountmgr
crate, which for now is quite bare bone.
To make things more interesting you can extend it by adding the following functionalities, with the API you please:
- Add a separate constructor function that randomizes password creation instead of taking one as input. For this you can use the
rand
crate and implement various "strong password" strategies to produce high-entropy passwords. - Add a function that returns the hashed version of a password account. You can use the
sha1
crate for this. - Add a batch constructor
fn from_file(filename: &Path) -> Vec<Self>
to read account data from file, one account per line. Here is a file with many accounts (~20k) that you can use for testing purposes.
Later on, when asked to test accountmgr
in the remainder of this lab session, also add tests for these additional functionalities!
If you decide not to do this exercise right now, but finish early with the rest of the lab assignments, come back to this one later, add missing functionalities above, add corresponding tests, etc., until done.
Testing Rust code
Unit testing
Unit tests are shipped together with the implementation code of a crate. They have access to all code entities in the implementing module, no matter their visibility from other modules.
Exercise 2.a:
add a sub-module tests
to the library crate of your accountmgr
package.
Add to it basic unit tests to convince yourself that the functionalities you have implemented thus far are correct.
It is up to you what test cases you want to use, but follow the usual rule of thumbs of testing weird/edge cases as well as some easy ones.
Make sure that cargo test
pass.
Make sure that you also test the internal private state of your values (e.g., that a password field is properly set to the same value passed to the constructor), because unit tests are your only chance to do so.
Integration testing
Integration tests are shipped in separate files located under the top-level tests/
directory and are compiled to independent binary crates linked to the main library crate.
As such, they can only access the public API of your library crate and not its private entities.
Exercise 2.b:
move all the tests you can from the tests
sub-module to integration tests located under tests/
and make sure that cargo test
still pass.
It should be possible to turn most of them to integration tests, but not all of them.
Which tests couldn't you turn into integration tests and why?
Documentation tests
Documentation tests are integration tests that reside alongside with the implementation code, in docstrings.
Exercise 2.c:
Add documentation to your code in the form of docstrings.
At the very minimum, for each public entity in the module provide a succinct explanation of what it does.
Check the documentation of rustdoc on how to write documentation and the details about the docstring markup language.
Verify by running cargo doc
that documentation is properly generated under target/doc/
for all the crates related to your package.
Exercise 2.d:
Either add new or turn some of your existing integration tests into documentation tests integrated into your docstrings.
Try to come up with doctests that are actually useful as documentation, because that is one of the key points of doctests.
For example, have you documented what should happen when trying to parse: a (1) well-formed account string, (2) an empty account string, (3) a non-empty account string that lacks the colon separator?
If not, this is your chance to do so!
Make sure that cargo test
still pass.
Code coverage
In a previous lab session you have seen how to use code coverage in C/C++ to guide the addition of test cases, so that all your code is exercised during the execution of a test suite.
Code coverage tooling is available for Rust as well.
You are going to give a try to Tarpaulin a code coverage tool for Rust, that can trace exercised code during test suite execution via either ptrace on Linux/x86_64 (by default) or LLVM's SanitizerCoverage (passing the --engine llvm
flag, supported on platforms other than Linux/x86_64).
Exercise 2.e:
Read the documentation of tarpaulin crate and install it so that the cargo tarpaulin
command become available in your Rust installation.
When you run cargo tarpaulin
in a Rust package, it will run the same tests of cargo test
(in fact: Tarpaulin even strives to offer a fully-compatible CLI), but under code coverage instrumentation.
At the end of the execution Tarpaulin will emit a report about covered/non-covered lines for each module.
A coverage "diff" w.r.t. the previous run will also be emitted.
Note that currently Tarpaulin only offers line coverage; branch coverage is not implemented yet.
Exercise 2.f:
Run cargo tarpaulin
on your package and verify what's your current line coverage.
By default, only a succinct textual report is returned; try adding the --out html
flag to obtain a browsable version of your code, with covered/non-covered lines highlighted.
If your line coverage is not 100%, try to reach that percentage by adding new test cases that cover the missing lines. If it's already 100% go back to Exercise 1.c, add a missing functionality, verify it decreases line coverage, and then make it reach full coverage again.
Property testing
Recall from lecture that:
Property testing is a testing technique [...] where we:
- randomly select test inputs,
- assert general properties that are automatically verifiable on test input/output pairs,
- verify that properties hold on the obtained test input/output pairs.
You are going to use the proptest
crate to verify some properties about the accountmgr
crate.
Add proptest
as a development dependency to your crate with cargo add --dev proptest
; doing so will make your package only conditionally depend on protest
, specifically when compiling tests, examples, and benchmarks.
Verify within Cargo.toml
that the dependencies has indeed been added under the [dev-dependencies]
section.
Exercise 3.a:
Add a new unit test to those already present in your lib.rs
module, which randomizes the parse string so that only valid account strings are formed (e.g., you can start with a simple regex like r"[a-z]*:[a-z]*"
representing letters followed by a colon followed by letters, and then improve it to be more "nasty" by adding other special characters).
Note that the new test will need to be put within a proptest!
environment.
Add an assertion to verify that the account string is properly parsed (i.e., that parse_account
does not return an error).
Here are some useful references about proptest
: crate, documentation, book.
Let's now strengthen our parsing requirements, because accounts with empty usernames and/or passwords are no good, right?
Exercise 3.b:
Change your parse_account
function so that it refuses accounts strings containing empty username and/or passwords.
Run your unit tests again. proptest
tests should catch the fact that empty usernames/passwords are no longer correctly parsed and give you examples of account strings that fail.
Improve your proptest
regex so that only (new) valid account strings are produced.
Add specific (non-random) unit tests to handle the cases of empty usernames/passwords.
Make sure your line coverage is still 100%.
Fuzzing Rust code
And now for something completely different.
ELF is the file format used by most executable binaries on Unix systems, including Linux. For $reasons you want to implement a command-line tool to check whether a file is a valid ELF binary or not. When invoked on an existing file it should return 0 (and output a nice explanation message) if the file is a valid ELF binary, 1 otherwise (and output another nice explanation message).
Oh, look! There is a handy elf_rs
crate on crates.io, let's use that as plumbing for implementing your tool.
Exercise 4.a: To begin with:
- create a package
check_elf
containing a library crate, - add to it as a dependency version 0.1.3 of
elf_rs
(e.g., by running:cargo add elf_rs@0.1.3
) and verify inCargo.toml
that the dependency has been added correctly, - in
src/lib.ml
implement the following functions by filling in theirtodo!()
-s:
#![allow(unused)] fn main() { pub fn is_valid_elf(content: &[u8]) -> bool { todo!() } pub fn is_valid_elf_file<P: AsRef<Path>>(filename: P) -> bool { todo!() } }
Tips:
- To check if a file is valid ELF with
elf_rs
just built anElf
structure usingElf::from_bytes
on the bytes (Vec<u8>
) read from a file. Iff you obtain anOk(_)
result the file is valid ELF. - Use the former function to implement the latter.
In addition to integration tests, Rust packages can contain examples located under the top-level examples/
directory.
Each *.rs
file in there is a standalone binary crate, with a main()
function as its entry point.
Individual examples can be built and run using, respectively, cargo build --example=NAME
and cargo run --example=NAME
.
Examples are useful both as sample usage of a library crate and as standalone executables that do something useful.
Exercise 4.b:
Create an example file examples/check_elf.rs
by completing the following skeleton:
fn main() { let filename = std::env::args().nth(1).unwrap(); // get the first argument on the CLI or panic todo!(); }
where you can already find an example of how to read the first CLI argument.
Instead of todo!()
you should use your library crate functions to check if the specified executable is a valid ELF or not.
If it is output a message saying so and exit with code 0 (check out the std::process::exit
function from the stdlib for that); if it isn't print the opposite message and exit with code 1.
When done test your example program to make sure it behaves like this:
$ cargo run --example=check_elf /bin/ls
Finished dev [unoptimized + debuginfo] target(s) in 0.01s
Running `target/debug/examples/check_elf /bin/ls`
/bin/ls is a valid ELF file
$ echo $?
0
$ cargo run --example=check_elf ./Cargo.toml
Finished dev [unoptimized + debuginfo] target(s) in 0.01s
Running `target/debug/examples/check_elf ./Cargo.toml`
./Cargo.toml is NOT a valid ELF file
$ echo $?
1
Looks like your tool is working! Or is it? Let's use fuzzing to check our robust it is against malformed inputs.
First, you need to install the cargo fuzz
sub-command and initialize fuzzing in your check_elf
package:
$ rustup toolchain add nightly
$ cargo +nightly fuzz init
$ ls fuzz/fuzz_targets/
fuzz_target_1.rs
Note that you need to use the "nightly" Rust toolchain (and hence install it beforehand if you haven't yet), because cargo fuzz
is not fully stable yet.
The +nightly
flag passed to cargo
ensures that the subsequent command (fuzz init
in this case) are run using that toolchain.
Exercise 4.c:
Edit fuzz_target_1.rs
to fuzz the input of your is_valid_elf
function.
(It should be straightforward to do so, as the type of fuzzed payload and what that function expects are very close.)
Now run cargo +nightly fuzz run fuzz_target_1
to fuzz is_valid_elf
.
It will (most likely) identify a pretty serious issue.
What is it?
How do you explain an issue like that arising in a Rust program?
(Spoiler: you will learn a lot more about this in the next lecture.)
Exercise 4.d:
The problem you have encountered has been reported as a bug to the elf_rs
maintainers a while ago.
Since then, the issue has been fixed and the most recent version of elf_rs
at the time of writing (0.3.0) is no longer affected by this nasty problem.
Update the version of elf_rs
you are using in Cargo.toml
and try fuzzing again.
Is the problem still there?
(Phew!)
Lab 07 - Unsafe and FFI, and procedural macros
This lab session is made of two parts: unsafe/FFI first, macros second.
Part 1: unsafe and FFI
You will learn how to:
- write a build script;
- interface with a C package using bindgen;
- write a higher level interface for your bindings.
By the end of this part you will be familiar with the Rust tools and techniques used to safely interface with a foreign language.
(This part has been designed by Guillaume Duc. The lab instructions have been translated from French to English. All translation mistakes are ours.)
Part 2: writing a procedural macro
You will learn how to:
- write a procedural macro which works directly on the
TokenStream
; - write a procedural macro which uses a visitor pattern;
- write a derive macro that adds new capabilities to a structure.
You are not expected to complete all three macro exercises in the alloted time. Each exercise is self contained, you can choose what you want to work on after reading each exercise text.
By the end of this part you will be able to write procedural macros to enhance the Rust language capabilities.
Unsafe programming and FFI: setup
In this lab, we will create a Rust interface for the zstd
library.
Prerequisite
Check that the zstd
library and its development files (headers and .so
link) are installed on your computer (libzstd1
and libzstd-dev
packages on Debian based installations, zstd
on Arch Linux).
Creating a workspace
For this lab, we will need a cargo workspace containing two library crates that you will create later: zstd-sys
(low-levl interface) et zstd
(high-level interface)
Exercise 0.a: Create the cargo workspace which an empty members list.
Automatic creation of the bindings using bindgen
Exercise 1.a: Create a new library crate named zstd-sys
inside the cargo workspace.
Don't forget to add the name of the library to the members
list in your workspace definition file.
⚠️ For the rest of this page, everything happens in the zstd-sys
crate.
Exercise 1.b: Add bindgen
to the build-dependencies
section of Cargo.toml
. You can use cargo add
with the appropriate options to do so.
Exercise 1.c: Create a src/header.h
file containing
#include <zstd.h>
Exercise 1.d: Create a build.rs
file next to Cargo.toml
. Using the example given in class, add the code to generate the bindings. The library you want to link is zstd
, and all types and functions start with ZSTD_
.
Exercise 1.e: In src/lib.rs
, include the file generated by bindgen
.
Since the zstd library uses global variables whose name is not in uppercase, and some function names are not in snake case (words_separated_by_underscores_
), you can add this near the top of your src/lib.rs
file:
#![allow(non_uppercase_globals)]
#![allow(non_snake_case)]
Check that your library builds without errors using cargo build
.
Exercise 1.f (optional): Install cargo-expand
(cargo install cargo-expand
), and look at the code generated by bindgen using cargo expand --lib
. To use it you need to use the nightly Rust toolchain. If you don't want to do that, you may also look for the bindings.rs
file in target
and display its content (assuming you are in the zstd-sys
crate root):
$ find ../target -name bindings.rs -exec cat {} \;
Testing the low-level interface
Exercise 2.a: Add a test that checks that ZSTD_compress()
and ZSTD_decompress()
work as expected. To do so, generate a buffer containing random data (for example 1024 bytes of random data, see the rand
crate), compress it, decompress it, and check that the decompressed data match the original data.
This test should be created in a new file located in the tests
directory (that you must create) of the zstd-sys
crate.
You will need to use the zstd API documentation. You will also have to use the following auxiliary functions:
ZSTD_isError()
to check the values returned byZSTD_compress()
andZSTD_decompress()
;ZSTD_compressBound()
to get the maximum size of compressed data;ZSTD_getFrameContentSize()
(to get the size of the decompressed data).
High-level interface
It is now time to design a higher-level interface to the zstd library. Our goal is to no longer expose any unsafe
function to the user of our library.
Exercise 3.a: Create a new library crate named zstd
inside the cargo workspace.
⚠️ For the rest of this page, everything happens in the zstd
crate.
Exercise 3.b: Make zstd-sys
a dependency of the zstd
crate. You may have to lookup how to define a dependency by its path (which will be relative to the zstd
crate).
Exercise 3.c: Write a compress_buffer()
function with the following signature:
pub fn compress_buffer(src: &[u8], dst: &mut [u8], level: i32) -> Result<usize, ZstdError> {
// ...
}
ZstdError
is an error type that you will define yourself. You may use the thiserror
crate if you wish.
Exercise 4.d: Write a decompress_buffer()
function with the following signature:
pub fn decompress_buffer(src: &[u8], dst: &mut [u8]) -> Result<usize, ZstdError> {
// ...
}
Exercise 4.e: Write amax_compressed_size()
function with the following signature:
pub fn max_compressed_size(size: usize) -> usize {
// ...
}
Exercise 4.f: write a decompressed_size()
function with the following signature:
pub fn decompressed_size(src: &[u8]) -> Result<usize, ZstdError> {
// ...
}
Exercise 4.g: Add one or more tests, as you did for the zstd-sys
crate, to test that those functions work as expected.
Procedural Macros
During these practical exercises, we will be building some procedural macros. We will need to write new ones during the section dedicated to asynchronous programming.
Exercise 5.a: Create a crate called macros
that specifies in its Cargo.toml
that it is the source of procedural macros:
[lib]
proc_macro = true
Exercise 5.b: Add the dependencies that we will use to manipulate tokens and the abstract syntax tree, as well as to report errors:
proc-macro2
for token manipulationquote
for code generation with templatessyn
withfull
,visit-mut
, andparsing
featuresproc-macro-error
for better error messagestrybuild
to test the errors returned by our macros
Throughout this section, you are encouraged to use sub-modules to store your utility functions. Only the definitions of the procedural macros themselves should be at the top level of the crate.
Translating strings from English to French
We want to write a #[translate]
attribute macro that will translate English strings to French for string literals that represent numbers. For example, the following code:
#[translate]
fn main() {
let res = "forty two";
println!("12 + 30 = {}", res);
}
will display:
12 + 30 = quarante-deux
Only integer strings within an arbitrary range (e.g., 0..=100
) will be translated.
We will use the following two crates to implement this functionality:
english_numbers
for English numbersfrench_numbers
for French numbers
Exercise 5.a: Add these crates as dependencies to the macros
crate.
Preloading Strings
The english_numbers
crate does not provide a way to recognize an English number and retrieve its numeric value. Therefore, we will build a dictionary to store the string representation and its associated numeric value.
Exercise 5.b: Create a Translate
struct that contains a dictionary associating a string with an i64
, the type used by the english_numbers
crate.
Exercise 5.c: Create an associated function new()
that returns a Translate
object with a preloaded dictionary. We will only enable the spaces
formatting option and leave the other options disabled.
Choosing the String Replacement Technique
We could choose to use a mutable visitor to rewrite LitStr
nodes that correspond to an English number and replace them with the corresponding French term. However, this technique, which seems to work at first glance, will fail on simple tests like:
#[test]
#[translate]
fn test_translate() {
assert_eq!("trois", "three");
}
The visitor will visit the Macro
node when analyzing this function and encountering assert_eq!
. The visitor will correctly visit the path
and delimiter
fields, but it will not visit the tokens
field (available as a proc_macro2::TokenStream
), which is the content of the macro, as it may not be valid Rust code at this stage.
Therefore, we need to also intercept the visit of Macro
nodes to replace the literal tokens we are interested in. Since our procedural macro already works with TokenStream
, why not directly implement this solution? We don't need a visitor.
Transforming the Token Stream
Exercise 5.d: Write a method that substitutes the tokens corresponding to a string literal representing an English number in our dictionary with the corresponding French number. Be sure to recursively call this method when encountering a delimited group of tokens.
impl Translate {
fn substitute_tokens(stream: proc_macro2::TokenStream) -> proc_macro2::TokenStream {
todo!()
}
}
Note that the literal representation we have access to is the one in the source code, enclosed in double quotes (we can ignore string literals using other delimiters like r#""#
). Instead of removing these quotes, it may be easier to add them to the dictionary for direct comparison.
Exercise 5.e: Write a procedural macro #[translate]
that constructs a Translate
object and uses it to transform the TokenStream
. Remember that conversions with From
and Into
are implemented between proc_macro::TokenStream
(at the macro interface) and proc_macro2::TokenStream
(used inside the macro).
Exercise 5.f: Write tests for your macro. It may be useful to define a str!(a, b)
macro with macro_rules!
that dynamically constructs a string from a
and b
, without having the ab
string appearing in the source code:
// Check that out-of-range (1..=100) values are not translated
assert_eq!(str!("one h", "undred and one"), "one hundred and one");
Determining the Positive or Zero Bounds
We want to optionally specify the bounds for the numbers to be translated using an attribute. The following notations should be accepted:
#[translate] fn f() { ... } // Default bounds (0..=100)
#[translate(0..10)] fn f() { ... }
#[translate(0..=10)] fn f() { ... }
However, we want to reject incorrect constructions with clear error messages:
error: unexpected end of input, expected `..=` or `..`
--> tests/ui/translate.rs:3:1
|
3 | #[translate(10)]
| ^^^^^^^^^^^^^^^^
|
= note: this error originates in the attribute macro `translate` (in Nightly builds, run with -Z macro-backtrace for more info)
error: expected integer literal
--> tests/ui/translate.rs:6:13
|
6 | #[translate(..10)]
| ^^
error: unexpected end of input, expected integer literal
--> tests/ui/translate.rs:9:1
|
9 | #[translate(10..)]
| ^^^^^^^^^^^^^^^^^^
|
= note: this error originates in the attribute macro `translate` (in Nightly builds, run with -Z macro-backtrace for more info)
error: expected integer literal
--> tests/ui/translate.rs:12:13
|
12 | #[translate(x)]
| ^
To achieve this, we will build a structure on which we can implement the syn::parse::Parse
trait:
struct Bounds { low: i64, high: i64 }
Exercise 5.g: Implement the Parse
trait on Bounds
. You have to read an integer with type LitInt
(syn
handles the unary minus sign), look for one of ..=
and ..
, read the higher bound and build the Bounds
object. You might want to use Lookahead1 to make things easier.
Exercise 5.h: Add specific tests to check that you can read the various intervals. To avoid exporting private types, you may add the tests in a submodule which is defined only in testing mode:
#[cfg(test)]
mod tests {
…
}
You can parse strings with parser T
using syn::parse_str::<T>(s)
, this might be handy in your tests.
Exercise 5.i: Update the translate
macro so that it reads the bounds from its attribute if it is not empty, and initialize the Translate
object appropriately.
Exercise 5.j: Add tests. For example, this test must pass.
#[test]
#[translate(-10..=10)]
fn test_negative_bounds() {
assert_eq!("moins dix", "negative ten");
assert_eq!("dix", "ten");
assert_eq!(str!("neg", "ative eleven"), "negative eleven");
assert_eq!(str!("ele", "ven"), "eleven");
}
Conclusion
We have seen that several methods might be combined to implement a macro. Here, we wrote a dedicated parser to read bounds, and also worked with the token stream directly.
Chaining Computations
The Elixir language allows for chaining computations using its |>
operator, which inserts the left-hand argument as the first argument of the function call on the right-hand argument:
iex> "Elixir" |> String.graphemes() |> Enum.frequencies()
%{"E" => 1, "i" => 2, "l" => 1, "r" => 1, "x" => 1}
We would like to achieve the same thing in Rust using a visitor. Since we want to use a visitor but using syn
builtin parsing capabilities, we will take a valid Rust tree as input. Therefore, we will sacrifice the |
character (pipe, which represents "bitwise or") for this purpose.
#[pipe]
fn pipe_example() {
let f = "Rust" | str::chars | Vec::from_iter;
assert_eq!(f, vec!['R', 'u', 's', 't']);
}
This code is equivalent to:
fn pipe_example() {
let f = Vec::from_iter(str::chars("Rust"));
assert_eq!(f, vec!['R', 'u', 's', 't']);
}
Our #[pipe]
macro and its |
operator should also support function calls with multiple arguments. The expression a | f(b, c)
is equivalent to f(a, b, c)
.
Implementation
Exercise 6.a: Use a syn::visit_mut::VisitMut
visitor to intercept the analysis of expressions and replace a binary expression using the |
operator followed by a function call or a path with a function call that uses the left-hand argument as the first argument of the call. Remember to recurse to visit the new node (or the sub-nodes if no transformation was made).
Consider what type of node you want to visit in order to modify it. For example, if you are modifying a ExprBinary
node that represents a binary expression, you cannot replace it with a node that is not a binary expression. If you are modifying an Expr
node that is less specific, it allows you to replace an expression with another.
Also, remember that you can use syn::parse_quote!()
to generate nodes of any type from a template.
Exercise 6.b: Create a procedural macro attribute pipe
that uses this visitor to transform any item using the visitor.
Exercise 6.c: Add tests to verify the correct behavior of this macro, which should be applicable to functions, modules, and implementations.
Protected Field Access
For pedagogical purposes rather than practical ones, we want to implement a derive-like macro called Opaque
. This macro allows us to define secure accessors for the specified fields of a structure:
#[derive(Opaque)]
struct SecureSettings {
#[key] secret_key: u64,
regular_field: String,
#[opaque] protected_field: String,
#[opaque] other_protected_field: u32,
}
Our macro will automatically add an accessor for fields marked with #[opaque]
. This accessor will take a key
parameter, which should be of the same type as the field marked with the #[key]
attribute, and will return an Option
containing the requested field only if the passed key is correct. The generated code will look like this:
impl SecureSettings {
fn get_protected_field(&self, key: &u64) -> Option<&String> {
(key == &self.secret_key).then(|| &self.protected_field)
}
fn get_other_protected_field(&self, key: &u64) -> Option<&u32> {
(key == &self.secret_key).then(|| &self.other_protected_field)
}
}
Implementation
Exercise 7.a: Write a skeleton for the Opaque
derive-like macro in the macros
crate. This macro should take additional key
and opaque
attributes to mark fields. For now, don't return anything useful.
Exercise 7.b: Verify that the argument passed to the macro is indeed a structure containing named fields, and provide an appropriate error message if not.
Exercise 7.c: Identify the field marked with #[key]
, which should be unique, as well as the fields marked with #[opaque]
. The field containing the key cannot also be #[opaque]
.
Exercise 7.d: After storing the name, type, and identifier to be used for each opaque field in vectors, generate the code. You will also need to have variables accessible during expansion for the name and type of the field containing the key.
Exercise 7.e: Test, including tests with trybuild
to verify the accuracy of error messages.
Lab 08
In this lab session, you will learn about:
- Dividing a problem into smaller pieces.
- Parallelizing each subproblem.
- Benchmarking your solution.
Don't spend too much time on the first two parts which don't use threads. Don't hesitate to ask for help of faculty or other students. The interesting part is when you start using threads ("Parallel computing").
Counting characters
Today, we are interested into counting the number of occurrences of every character in a text. For example, in the string "foobar!!"
, the characters 'f'
, 'b'
, 'a'
, and 'r'
appear once each, while the characters 'o'
and '!'
each appear twice.
Exercise 0.a: Create a new Rust binary project named char-counter
in which our code will be placed.
The count_chars()
function
A text is usually made of several lines. We need a function which takes those lines and returns the occurrences of the various characters.
Exercise 0.b: Write a function with the following prototype:
fn count_chars(input: &[&str]) -> HashMap<char, usize>
This function will build a HashMap
whose keys are the various characters and the values are the number of occurrences of each character. To do so, it will loop over the &str
of input
, and over the characters of each &str
.
Tips for implementing count_chars()
The str::chars()
method may be useful: it returns an iterator over the characters of a string.
Also, if h
is a HashMap<K, V>
and V
implements Default
, the following idiomatic construct may be useful:
h.entry(some_key).or_default()
It returns a &mut value
where value
is the existing value for key some_key
; if the entry for some_key
does not exist in the map, it is created with V::default()
, the default value for the type V
, and a mutable reference to this new inserted value is returned.
So when implementing a counter, the following construct can often be found, since all integer types have a default value of 0:
*h.entry(some_key).or_default() += 1;
This increments the value corresponding to some_key
, or set it to 1 if it didn't exist in the map before (0
which is the default value + 1).
Note: do not stay stuck on this one. If after 15 minutes you feel that you will take too much time to write this function, you can get inspiration from this code and go on with the lab.
Testing the function
Exercise 0.b: Print the occurrences of every character in the text made from the two lines "Hello, world!", and "I am happy to be here.".
Check by hand that you get the expected result.
Given that HashMap<K, V>
implements the Debug
trait if K
and V
do, you can use (:#?
pretty-prints the debug information):
fn main() {
let text = ["Hello world!", "I am happy to be here."];
let freq = count_chars(&text);
println!("{freq:#?}");
}
You should obtain something like:
{
'H': 1,
'b': 1,
'y': 1,
'a': 2,
' ': 6,
'!': 1,
't': 1,
'l': 3,
'd': 1,
'I': 1,
'p': 2,
'o': 3,
'r': 2,
'w': 1,
'm': 1,
'e': 4,
'.': 1,
'h': 2,
}
You will probably agree that this is not very pleasant to read. We can do better.
Prettier display
Note: if you feel that you do not have the time to implement this part, feel free to get some code from here (this is not the point of this lab).
In order to make our program useful, we want to:
- sort the characters by occurrences;
- display only the 10 most frequent characters (in this two lines the choice will be arbitrary, with larger text it will get interesting).
How can we do that? Learn or remember that:
- Iterating over a
HashMap<K, V>
gives you an iterator over pairs(K, V)
. Those pairs could be collected into aVec<(K, V)>
. - Vectors can be sorted. By using the
sort_by_key()
method, we can sort it by the key we choose. For example,vec.sort_by_key(|(_k, v)| v)
will sort the vector elements according to the second elementv
of the pair. - Iterating over a vector can be reversed:
vec.into_iter().rev()
gives an iterator over the vector from the end to the beginning. - An iterator can be limited to its first
N
elements using the.take(N)
method.
Exercise 0.c: display only the 10 most frequent characters in the text. Keeping our example, it should print something like:
- ' ': 6 occurrences
- 'e': 4 occurrences
- 'l': 3 occurrences
- 'o': 3 occurrences
- 'a': 2 occurrences
- 'r': 2 occurrences
- 'h': 2 occurrences
- 'p': 2 occurrences
- 'y': 1 occurrences
- 'w': 1 occurrences
And no, we do not care about the extra "s" in "1 occurrences". Fix it if you want but this has 0 importance.
Getting big
Now that our program looks fine, we want to go bigger and consider larger texts.
Exercise 1.a: Download the novel Moby-Dick; or The Whale, by Herman Melville, and place it in the same directory as your Cargo.toml
.
This novel, which totals 22316 lines, will be more interesting than our hand-crafted two lines.
Reading the file
Exercise 1.b: Since this lab is not about how to read files, copy the following use
statements and function into your program (the easiest exercise ever):
use std::fs::File;
use std::io::{self, BufRead};
fn load_file(name: &str) -> Result<Vec<String>, io::Error> {
io::BufReader::new(File::open(name)?).lines().collect()
}
Have you noticed that load_file()
returns a Vec<String>
? This will not be convertible to a &[&str]
that we need to count characters, so we will need some adapting.
Adapting count_chars()
We want to adapt count_chars()
so that it accepts a slice of &str
, as it did before, but also a slice of String
. In fact, we would like to accept a slice of any type which can be easily seen as a &str
.
The trait AsRef<T>
means exactly that: when implemented on a type U
, it means that without doing any extra copy, an object of type U
can be viewed in memory as an object of type &T
. For example, String
implements AsRef<str>
: calling .as_ref()
on a String
will return a &str
pointing to data owned by the String
.
Also, every type T
implements AsRef<T>
, as seeing a T
as a &T
is trivial.
Exercise 1.c: Change the signature of count_chars()
to the following one, accepting a slice of any type that can be seen as a &str
. Also, use .as_ref()
on the provided data (in the inner loop) to convert the real type S
to a &str
.
fn count_chars<S: AsRef<str>>(input: &[S]) -> HashMap<char, usize>
As soon as you have done that, you are able to pass either a &[&str]
or a &[String]
to count_chars()
, and of course a &Vec<String>
thanks to Deref
which allows a reference to a vector to be seen as a slice.
Exercise 1.d: Change the main()
function signature so that it returns Result<(), io::Error>
, and make it load and analyze the character frequency of Moby Dick.
Have you noticed that it takes more time than when using our two lines? Let's parallelize this!
Parallel computing
On our computers, we usually have multiple cores which can work in parallel. In this section, we will use several threads. Each thread will analyze a subset of the input data, and we will later aggregate the results.
Here is how it will work for n
threads:
- The number of lines (let's call it
l
) will be divided inton
chunks. The goal is to givel/n
lines to each thread. - A MPSC (multiple producers single consumer) channel will be created.
- Each thread will compute the statistics on the lines it got and will put the map containing the result on the channel.
- The main thread will aggregate the various statistics into a single map and return it.
We will use scoped threads to alleviate lifetime constraints.
Creating the parallel version
Exercise 2.a: Create a function with the following signature which uses up to n
threads to compute the number of occurrences of each characters:
fn count_chars_parallel<S: AsRef<str> + Sync>(input: &[S], n: usize) ->
HashMap<char, usize>
You will notice that we have an extra constraint on S
: since objects of type S
will b referenced from other threads (the one we will be creating), it must be Sync
in addition to being AsRef<str>
.
Breakdown of the count_chars_parallel()
function
This function must do the following things, in order.
First it must create a (tx, rx)
pair corresponding to a MPSC channel.
It must then create a scope
in which new threads will be spawned.
For each chunk
of size n
or less (for the last chunk):
tx
must be cloned into a new variable that will be captured by the new thread;- a new thread should be spawned from the current scope, capturing
tx
and the current chunk.
This thread will compute the characters' occurrences (using count_chars()
) and send the result on the MPSC channel using its captured tx
clone.
It is now time to close the thread scope and to concentrate on receiving data from the spawned threads. Since we are after the thread::scope()
call, at this stage we know that all analysis have been sent on the MPSC channel.
How does the reception part work?
rx.recv()
will wait until data is available. As long as there is data available on the channel, it will returnOk(some_data)
.- If all senders are dead (
tx
and all its clones have been dropped) and all data has been read,rx.recv()
will returnErr(…)
.
At this stage, all threads have terminated their work. We can read data from the MPSC channel through rx
: we will get a bunch of Ok(…)
containing data from the threads, then Err(…)
saying that everything has terminated. Right? No… We have to get rid of the tx
variable which is still present. We have cloned it each time we have created a thread, but since tx
still exists the use of the channel is never considered complete.
How do we get rid of tx
? std::mem::drop(tx)
is intended for that usage.
Now that we have done that, we can collect every map which comes from the MPSC channel. Since you already know how to do that, let us share the code to build the aggregated hash map and save some time:
let mut freq = HashMap::default();
while let Ok(fr) = rx.recv() {
for (c, n) in fr {
*freq.entry(c).or_default() += n;
}
}
Returning freq
is the last thing to do.
Calling the parallel version
Exercise 2.b: Update your main()
function so that it uses count_chars_parallel()
with, for example, 4 threads, so that it processes Moby Dick text faster.
Benchmarking
Is it really faster to use multiple threads? How many threads should we be using? Rather than guessing, let us benchmark the performances.
Benchmarking consists into executing a task a certain number of times and averaging its execution time over the multiple executions.
Rust has two types handy for dealing with type: Instant
, which represents in point in time, and Duration
which represents, well, a duration. Instant::now()
is the current instant in time, and Instant::elapsed(start)
returns a Duration
since the instant start
.
Also, a Duration
can be divided by an integer (of type u32
) and return a Duration
. It means that if you do something like:
let start = Instant::now();
for _ in 0..times { // Here times is a u32
do_something();
}
let mean_duration = Instant::elapsed(start) / times;
you will get the average time spent to execute do_something()
in mean_duration
. Also, a Duration
implements Debug
and will print something like "100 milliseconds" or "3.7 seconds".
Now that you know all this…
Exercise 3.a: Implement a benchmark()
function which counts the characters' occurrences over n
threads by executing the count_chars_parallel()
function times
times and returns the result:
fn benchmark<S: AsRef<str> + Sync>(input: &[S], n: usize, times: u32) -> Duration
Exercise 3.b: Implement a benchmark_all()
function which calls benchmark()
with 1 thread to max
threads and prints the results:
fn benchmark_all<S: AsRef<str> + Sync>(input: &[S], max: usize, times: u32)
Exercise 3.c: Call the benchmark_all()
function in your main program and look at the result.
See them decrease when you go over the number of cores on your machine.
If you have more time
If you happen to have more time, you can use the clap
crate to configure the file to read, the maximum number of threads, and the number of times you will run the benchmarks. Also, you might want to print a summary of the most frequent characters used, with a configurable number of characters printed:
$ cargo run -- --help
Usage: char-counter [OPTIONS] <FILE>
Arguments:
<FILE> The file to count characters from
Options:
-m, --max <MAX> The maximum level of parallelism to achieve [default: 8]
-t, --times <TIMES> The number of times to run each test [default: 100]
-s, --stats <rank> Display statistics
-h, --help Print help
Exercise 4.a: Do that, and make sure you send a mail to the faculty telling them you did it so that you may earn extra points (and put the code in your git repository so that they can check).
Lab 09
In this lab session you will
- learn how to build an asynchronous HTTP client
- see the benefit of doing asynchronous requests
- learn how to build an asynchronous HTTP server
An asynchronous client
The setup
A proxy server reproducing Wikipedia page data for programming languages has been installed on https://net7212.rfc1149.net. For example, by requesting https://net7212.rfc1149.net/wikipedia/Rust, you can get the raw HTML page corresponding to the Rust page on Wikipedia.
The issue with this server is its slowness: requesting a page takes at least 2 seconds (on purpose). Since in this lab we will make several requests to this server, we want them to be done in parallel (asynchronously) rather than sequentially.
Using reqwest
reqwest
is a popular crate used to build HTTP clients. It supports asynchronous requests, and we will build on this. Its uses tokio
as its runtime.
Exercise 0.a: Create a new binary Cargo project client
with reqwest
as a dependency, as well as tokio
with the feature flag full
. We will also add anyhow
to handle errors, or color_eyre
(don't forget to initialize it) as you prefer.
Make an asynchronous reqwest to get the Rust page
Exercise 0.b: In your main()
async function, build a Client
. With this object, do a get()
request to get the "Rust" wikipedia page through the proxy server and retrieve the body text (see the reqwest
documentation for how to do that). Display the number of bytes in the body.
⚠️ Don't forget to use the tokio::main
attribute on your main()
async function so that it gets executed in an asynchronous context:
#[tokio::main]
async fn main() -> anyhow::Result<()> { // or color_eyre::Result<()>
…
}
Time your main program
We want to see how long our main program takes to execute.
Exercise 0.c: Using the Instant
type, measure and display the time taken by your main()
program.
You may be interested by Instant::now()
as well as Instant::elapsed()
. Also, note that a Duration
can be printed in Debug
mode and print something like "29.8ms" or "2s". By using the floating point notation, you can even limit the number of digits after the decimal point with a format string like {:.3?}
.
Make several asynchronous requests
Exercise 0.d: Reorganize your code to create this function which returns information about a programming language and use it from main()
:
async fn retrieve(client: &Client, language: &str) -> anyhow::Result<String>
(or eyre::Result<String>
if you used eyre
)
For example, calling retrieve(client, "Rust")
should return the Rust page.
Exercise 0.d: In addition to information about the number of bytes in the "Rust" page body, print information about the "C" programming language as well.
You should notice that it takes twice the time, as requests are serialized. We will not make them parallel.
Exercise 0.e: Import the futures
crate as a dependency. Using the join!
macro, make both requests in parallel then print the results.
Note how the time is back to around 2s.
Take requests from the command line
Exercise 0.f: For every programming language name found on the command line, print the number of bytes in its page body.
For example, running cargo run --release -- C Rust Java
should print something like
- C: 401467 bytes
- Rust: 461307 bytes
- Java: 362262 bytes
Elapsed time: 2.008s
You can either use clap
or more simply std::env::args()
to get the list of languages (in which case, do not forget to skip the first item). The join_all()
function from the futures
crate may be useful to do this.
You can use zip
on an iterator to unite items of two iterators by pairs.
An asynchronous server
This part builds a proxy server similar to the one you have been using for this lab.
This part may take a little more time to do, but we are talking about some 60-70 lines of code top. Do not worry if you don't terminate in the allotted time.
The project
We will use the axum
and axum-server
crates to build a HTTP server. We will accept requests with a path like "/wikipedia/LANGUAGE", and forward them to "https://en.wikipedia.org/wiki/LANGUAGE_(programming_language)". When the response arrives, we will forward it to the caller, along with the status code and most headers.
Exercise 1.a: Create a new cargo binary project server
, with dependencies on crates axum
, axum-server
, tokio
with feature flag full
, and reqwest
.
Answering a simple HTTP request
Exercise 1.b: Create a axum
server, listening on port 3000, accepting a route of /wikipedia/:language
and redirecting it to a wikipedia()
function. Return just the request name as a String
.
async fn wikipedia(Path(language): Path<String>) -> String
Returning the Wikipedia page
Exercise 1.c: Change the signature of the wikipedia()
function to return a tuple of StatusCode
, HeaderMap
and a String
, or just a StatusCode
if there is an internal error:
async fn wikipedia(Path(language): Path<String>) -> Result<(StatusCode, HeaderMap, String), StatusCode>
In this function, query the right Wikipedia page using reqwest
, and return the status code, headers and body from the wikipedia response. The "content-length" header must be removed from the headers as it will be recomputed by HTTP middleware used by axum
1.
If there is any error, remap it (using .map_err()
to StatusCode::INTERNAL_ERROR
) and exit prematurely from the function.
Exercise 1.d: Run your server, it should listen on port 3000. Modify your client so that it uses http://localhost:3000/wikipedia/LANGUAGE
as a reference, and check that everything works well.
If it does, it means that your server is properly running things in parallel. However it probably answers too fast and does not add the extra delay.
Slowing down the proxy server
Exercise 1.e:
Before returning the answer, sleep for 2 seconds. Of course this asynchronous sleep function must be used, using std::thread::sleep()
would block the whole server thread and prevent other requests from progressing in parallel.
Sharing resources
The same Client
can be used to perform concurrent requests. Using the same client to do multiple to the same site (here Wikipedia) will automatically reuse the connection and reduce the load on the server.
The Extension
let you add a layer to your application. This layer will be available in any request handler.
Exercise 1.f: Add a Arc<Client>
extension to your application, and use this client in your wikikpedia()
function.
Adding some cache
Every time you want to access a programming language page, you reload it from Wikipedia. Even though you reduced the load by sharing the client through an extension, if a client requests the same page several times you may end up requesting a page from Wikipedia several times.
You can add a cache, whose type would be:
type Cache = HashMap<String, (StatusCode, HeaderMap, String)>;
The idea here is to share a mutable Cache
object and look for the result here first. If it is not available, Wikipedia will be queried, and the result will be stored in the cache then returned to the client.
You will need some synchronization here. Rust's RwLock
seems like a good candidate.
Exercise 1.g: Share a Arc<RwLock<Cache>>
resource, in order to avoid doing multiple identical requests to Wikipedia.
Be cautious though: you must never keep a resource locked (be it in read or write mode) accross a await
point. Limit locking the cache to the shortest possible period.
The communication with Wikipedia server through reqwest
uses content negotiation and retrieves compressed pages. The Content-Length
corresponds to the compressed data, which is neither the original data size nor necessarily the size of the data we would obtain if we used compression ourselves (unless we used the very same algorithm with the very same parameters). It is best to leave the Content-Length
computation to the web framework.