Introduction
This website contains teaching material for the laboratory sessions of the course Safe System Programming (in Rust) (SSP-RS), at Télécom Paris / Polytechnique Institute of Paris.
ⓒ 2022-2024 Samuel Tardieu and Stefano Zacchiroli
Lab 01 — Unsafe System Programming
In the first lab session of the course, you will mainly work on:
- Setting up your development environment for the rest of the course, and
- Experimenting with unsafe system programming, leading to buffer overflows and other nasty stuff.
Let's get to it.
Development setup
For the duration of the entire lab sessions of this course you are going to use a number of tools, for the most part related to development in the C and Rust programming languages. As the first task for this lab session you are hence going to install the relevant toolchains and a comfortable (for you) development environment.
Operating system requirements: we are going to assume that your work machine runs Linux as operating system (OS). If that is not the case, you should fix this before proceeding. Any Linux-based distribution would do, but in the following we are going to rely on package names from Debian and Debian-based distributions (e.g., Ubuntu); you should translate them to your distro if need be.
Work environment
You have various options in terms of which "machine" to work on:
- work on your real (hardware) machine: laptop or desktop computer;
- work in a Docker container;
- work in a virtual machine.
Option 1: Work on your machine
We recommend this option as it is the most flexible, but it requires more manual setup on your side—setup which we are not going to detail here as it heavily depends on your OS choice.
We only list below the tools (and corresponding Debian package names) that you should install on your machine to be able to complete the exercises of this course.
Option 2: Work in a Docker container
If you would like to work in a Docker container, we provide a Dockerfile that you can use to create locally a Docker image that contains all the tools used for lab sessions. Download the Dockerfile locally and build a matching image like this:
$ docker build -t net7212 - < Dockerfile
(It will take a few minutes to complete, depending mostly on download and I/O speeds.)
Then, create a shell script named run.sh
to easily create and run a fresh container on that image every time you need it.
It can be something like the following:
#!/bin/bash
host_labdir="${HOME}/net7212-lab"
cont_labdir="/home/net7212/lab"
docker run -ti \
--net host \
-e DISPLAY="unix${DISPLAY}" \
-v /dev/shm:/dev/shm \
-v /etc/hosts:/etc/hosts \
-v /etc/localtime:/etc/localtime:ro \
-v /tmp/.X11-unix:/tmp/.X11-unix \
-v /var/run/dbus/system_bus_socket:/var/run/dbus/system_bus_socket \
-v $HOME/.Xauthority:/home/net7212/.Xauthority \
-v "${host_labdir}:${cont_labdir}" \
--rm \
--name net7212 \
net7212
note that host_labdir
should point to a directory on your real machine, that will be mounted within the container so that you can share files between the two worlds.
All your lab session material will need to be in this directory if you want to access it from within the container.
You can then enter and use a fresh container every time you need it like this:
./run.sh
net7212@machine:~$ rustc --version
rustc 1.72.0 (5680fa18f 2023-08-23)
and that if you create a file in your home directory under ~/net7212-lab
you can also access it from within the container, e.g.:
# outside the container
$ echo "// hello.c" > ~/net7212-lab/hello.c
./run.sh
net7212@machine:~$ cat ~/lab/hello.c
// hello.c
net7212@machine:~$
Option 3: Work in a virtual machine
If you prefer to fully control your work environment (like with option (1)), but not impact your day-to-day work machine, you have the option to install a virtual machine and configure it manually. Just install in there the same tools that you would have installed, following the instructions below, on your real machine. We are not going to detail this step further, as it heavily depends on your choice of virtualization technology.
C development toolchain
If you have chosen to work on your (virtual) machine, you should install the following tools:
Tool | Debian package name |
---|---|
bats | bats |
check | check |
clang-tidy | clang-tidy |
clang | clang |
gcc | gcc |
gdb | gdb |
ltrace | ltrace |
make | make |
strace | strace |
valgrind | valgrind |
If you have chosen the Docker option, they are all already installed in the container.
Exercise 0.a: Install all the above dependencies in the work environment of your choice.
When ready, make sure you can compile and run the following Hello World program with both gcc
and clang
:
#include <stdio.h>
int main(void) {
printf("Hello, World!\n");
return 0;
} // end of hello.c
like this:
$ gcc -Wall hello.c
$ ./a.out
Hello, World!
$ clang -Wall hello.c
$ ./a.out
Hello, World!
Now make sure that all of the following commands execute properly without failing on your setup (the exact output might very depending on version, configuration, etc.; that's OK):
$ bats -v
Bats 1.8.2
$ clang-tidy --version
Debian LLVM version 14.0.6
Optimized build.
Default target: x86_64-pc-linux-gnu
Host CPU: skylake
$ gdb --version
GNU gdb (Debian 12.1-4) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
$ ltrace --version
ltrace version 0.7.3.
Copyright (C) 1997-2009 Juan Cespedes <cespedes@debian.org>.
This is free software; see the GNU General Public Licence
version 2 or later for copying conditions. There is NO warranty.
$ make --version
GNU Make 4.3
Built for x86_64-pc-linux-gnu
Copyright (C) 1988-2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
$ strace --version
strace -- version 5.10
Copyright (c) 1991-2020 The strace developers <https://strace.io>.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Optional features enabled: stack-trace=libunwind stack-demangle m32-mpers mx32-mpers
$ valgrind --version
valgrind-3.19.0
Rust development toolchain
If you have chosen to work on your (virtual) machine, you should install the following tools:
Tool | Debian package |
---|---|
rustup | none (install by hand) |
If you have chosen the Docker option, they are all already installed in the container.
Exercise 0.b: Install all the above dependencies in the work environment of your choice.
Once installed, verify that you can successfully run the following basic tools of the Rust toolchain, like this (your output might very, depending on the installed versions):
$ rustup --version
rustup 1.26.0 (5af9b9484 2023-04-05)
info: This is the version for the rustup toolchain manager, not the rustc compiler.
info: The currently active `rustc` version is `rustc 1.72.0 (5680fa18f 2023-08-23)`
$ cargo --version
cargo 1.72.0 (103a7ff2e 2023-08-15)
$ rustc --version
rustc 1.72.0 (5680fa18f 2023-08-23)
Editor / IDE
You are also going to need a development editor or an IDE (Integrated Development Environment). As developers tend to be opinionated about this stuff we are not going to force you to use any specific one: pick your own!
We recommend either a flexible editor with IDE-like features (e.g., Emacs or Vim with LSP support for Rust) or a full-featured IDE like Visual Studio Code or NetBeans (do make sure it has support for Rust, though, as that will be needed later in the semester).
Exercise 0.c: Install the editor or IDE you want in the work environment of your choice.
When done, open the hello.c
file you used previously and make sure you can comfortably edit/compile/run it, ideally from within the editor/IDE itself.
Done?
Good! You are now all set with the developer setup!
Let's now move to the other matters of the day.
Memory unsafety
Out-of-bounds write
Exercise 1.a: write C programs (new, different from those seen in lecture 01) that exhibit weakness CWE 787: Out-of-bounds write. Your goal is to make your program segfault due to this CWE! Try out-of-bound writes in various memory segments, writing one program for each of the following cases:
- out-of-bound write in dynamically allocated memory on the heap (e.g., with
malloc
), - out-of-bound write in statically allocated memory (i.e., global variable, preinitialized or not),
- out-of-bound write in automatic memory (i.e., variables allocated on the stack).
Bonus point: make sure your programs do not emit any warnings when compiled invoking the compiler with the --Wall
flag (which asks to enable all compile-time warnings and that you should always use anyway!).
Try to determine experimentally what each program was doing just before the segfault (tip: try with gdb, nm, ltrace, strace).
What helped you determine this?
Exercise 1.b: consider the following local variable declarations at the beginning of a function (possibly main
):
char s1[16];
char s2[16];
write a program that does not segfault, but that with a single memory (or string) copy operation that writes to only one of these variables (s1
, s2
) modifies the content of both variables, with data of your choice.
Verify the result experimentally (e.g., with printf
or strcmp
).
Warning: do not forget about string \0
terminators!
Out-of-bounds read
Exercise 1.c: continuing from the idea from the previous exercise, write a program that exhibits at least one variant of CWE 125: Out-of-bounds read. The program should not segfault, but by performing a memory ready operation on a given variable, it should be able to read the content of another variable, correctly and in full.
Bonus point: make your program read the content of an out-of-scope local variable, e.g., one declared in a calling function.
Smashing the stack
Segmentation fault
Considering the following program:
void function(int a, int b, int c) {
char buffer1[5];
char buffer2[10];
}
int main() {
function(1, 2, 3);
return 0;
}
Exercise 2.a: compile the program to assembly language like this:
$ gcc -S -o example1.s example1.c
and identify in the generated assembly code where function
is called. How are arguments passed to the function? (The answer might vary depending on the compiler and architecture, but the assembly code will tell you!)
Exercise 2.b: compile, execute, and debug the following program:
#include <string.h>
void function(char *str) {
char buffer[16];
strcpy(buffer, str);
}
int main() {
char large_string[256];
int i;
for(i = 0; i < 255; i++) {
large_string[i] = 'A';
}
function(large_string);
return 0;
}
Explain what's happening and why.
Subverting the control flow
Historically (on 32 bit architectures and with less built-in defenses in C compilers) subverting the control flow via stack overflows was pretty easy. You can get an idea of how easy it was by reading the famous tutorial article Smashing The Stack For Fun And Profit by Aleph One (1996).
Nowadays it has become a little bit more complicated, but still very doable (and regularly done!, as we have seen in the course examples).
Exercise 2.c: go through the modern incarnation of Aleph One's tutorial, namely: Smashing the Stack in the 21st Century by Jon Gjengset (archived copy), reproducing the reported experiences on your machine. (Next week we will go through some of the defenses that you will have to manually disable to exploit a stack overflow, and discuss what they are good for and what they are not.)
Lab 02 — Hardening System Programming
In this lab session we will take a practical tour of existing techniques to secure system programs implemented in traditional system languages. We will experiment with: testing, fuzzing, code instrumentation (of both binary and source code), and static analysis.
By the end of the session you will have a better understanding of the state-of-the-art of hardening system programming, including its limitations.
Credits
The exercises of this lab session contain material and ideas reused with permission from week 1 assignments of Stanford's course CS 110L (2022) by Ryan Eberhardt, Armin Namavari, Will Crichton, Julio Ballista, and Thea Rossman.
Unit testing
Setup
Consider the initial C implementation of a simple currency type (a pair of a currency name and a currency amount) given in money.h and money.c.
We want to reassure ourselves that the code thus far is correct, using unit testing.
In check_money_tiny.c you can find the skeleton of a C program that uses money.c
as a library and unit tests it with a single test case.
It uses the Check unit testing framework for C, which you should have already installed as part of the first lab session.
It can be built and run like this:
$ gcc -Wall -c -o money.o money.c
$ gcc -Wall -c -o check_money_tiny.o check_money_tiny.c
$ gcc -Wall -o check_money_tiny check_money_tiny.o money.o `pkg-config --libs check`
$ ./check_money_tiny
Running suite(s): Money
100%: Checks: 1, Failures: 0, Errors: 0
check_money_tiny.c:17:P:Core:test_money_create_amount:0: Passed
Exercise 1.a: read through the implementation of money.c
and check_money_tiny.c
to make sure you understand what the code does.
Help yourself with the Check documentation as needed.
Code coverage
The notion of code coverage is a measure of how much of your implementation code is exercised during the execution of your test suite. It can be measured in various ways, we will focus on the simplest metric: how many lines of code are (not) executed during test suite running (i.e., line coverage). You can verify the line coverage of your current test suite using gcov, which is part of gcc by:
- compile/link your project passing the
-fprofile-arcs -ftest-coverage
options to compiler and linker, - execute your test suite (
./check_money_tiny
in this case), - inspect code coverage of a source code file you care about, e.g.:
$ gcov money.c
File 'money.c'
Lines executed:75.00% of 16
Creating 'money.c.gcov'
Lines executed:75.00% of 16
The summary tells you the percentage of lines of code exercised during execution.
The mentioned file money.c.gcov
is an annotated version of money.c
that shows precisely which lines have been executed (or not).
Exercise 1.b: measure the code coverage of the current test suite for money.c
.
Then, add all the test cases you need to maximize code coverage.
Can you reach 100% line coverage?
Why or why not?
Exercise 1.c: (optional and tricky, feel free to postpone this one to the end of the session) add a memory unsafety issue to the money.c
implementation, and specifically one that crashes the execution of your test case.
For example, you can implement a new money_add
method that performs an out-of-bound read or write.
Then add a test that run the buggy code, crashing the execution of the test case.
Does the test case crash cause the crash of the entire test suite? (i.e., do other test cases supposed to run after it keep running or not?)
Why or why not?
Stack overflow detection
Setup
Retrieve the program uppercase.c, compile it (recommended compiler options for this session are: -g -O0 -Wall -Wextra
), and run it, e.g., like this:
$ gcc -g -O0 -Wall -Wextra -o uppercase uppercase.c
$ ./uppercase 'Hello, World!'
HELLO, WORLD!
Now take some time to study the code of the program.
Exercise 2.a: the program is affected by a memory-related bug (EEEK!). Explain where the bug is, what its consequences might be, and how to fix it.
Static analysis with Clang-Tidy
Clang-Tidy (which you have installed already, as part of last week's lab session) performs some simple static analysis checks, ranging from basic linting to more advanced techniques. From the tool documentation:
clang-tidy is a clang-based C++ “linter” tool. Its purpose is to provide an extensible framework for diagnosing and fixing typical programming errors, like style violations, interface misuse, or bugs that can be deduced via static analysis.
Exercise 2.b: go (briefly) through the documentation of clang-tidy
and learn how to use it to analyze C source code files.
Then use clang-tidy
on uppercase.c
to check if it detects the bug you have identified in the previous exercise.
Does clang-tidy
catch the problem?
Based on what you have learned in last lecture, explain why clang-tidy
is capable to identify the problem or why it isn't.
Dynamic analysis with Valgrind
Clang-Tidy performs static analysis, Valgrind on the other hand performs dynamic analysis, based on dynamic binary re-compilation, as we have seen in last lecture. Let's see how Valgrind fares in detecting the bug with this program. After compiling your program, we can give it a try like this:
$ valgrind [--tool=memcheck] ./uppercase 'Hello, World!'
(Note: --tool=memcheck
is optional because Memcheck is the default tool that Valgrind uses unless otherwise requested.)
Exercise 2.c: does Valgrind detect the issue? Based on what you have learned in last lecture, explain why or why not.
Exercise 2.d: benchmark the time required to execute the program with and without Valgrind.
(Tip: as a simple way to do that you can use the standard UNIX time
command.
If you want to be fancier, this is your chance to learn about the amazing benchmarking tool Hyperfine and use it for a more scientifically accurate comparison.)
Dynamic analysis with LLVM sanitizers
As we have seen in last lecture, dynamic analysis can also be performed via source code instrumentation, which happens at compile time to add runtime checks. As we have seen LLVM sanitizers (look for the various entries with "Sanitizer" in their name in the Clang documentation) are a set of plugins for the Clang compiler that do that.
Exercise 2.e: change the compiler you are using to build from gcc
to clang
and first rebuild the program with it. (You can pass the same options suggested above, as they are compatible between the two compilers.)
Then, add all of the following sanitizers (check the documentation for how to do that) and rebuild the program: AddressSanitizer, LeakSanitizer, UndefinedBehaviorSanitizer.
Compare the size of the executables built with and without sanitizers. How different are they and why?
Exercise 2.f: run the instrumented program. Do sanitizers detect the issue? Based on what you have learned in last lecture, explain why or why not.
Let's streamline our build process a bit, as it is becoming a bit of a mess.
Retrieve the generic Makefile we make available for building the programs associated to this lab session.
If you are not familiar with Makefiles, go (briefly) through the GNU make documentation to make sure you know the basics.
In particular, read through the introductory material and make sure you understand the parts of Makefile syntax used in the given Makefile
, in particular: rules, variables, and static pattern rules.
(If you know about Makefiles already, feel free to skip this documentation study part.)
Exercise 2.g: change the retrieved Makefile so that:
- it only builds the
uppercase.c
programs (other programs mentioned in there will arrive later in this lab session!), and - passes the sanitizer flags to the compiler when compiling.
Recheck the result of the previous exercise about sanitizers to make sure they are there when building with make
.
Memory leak detection
Linked lists
Now retrieve the program linkedlist.c, add it to the Makefile, and compile it as before.
Take some time to study the program and understand what it does.
Exercise 3.a: this program is affected by (at least) two memory-related bugs this time, but of a different nature than before. Identify both of them, explain what the consequences might be, and how to fix them.
Exercise 3.b: try linting/static analysis with Clang-Tidy on this program, as you did before.
Does it detect the (right) issues? (Tip: beware of false positives!)
Exercise 3.c: try source code instrumentation with LLVM sanitizers on this program, as you did before.
Do they detect the issues?
In both cases feel free to speculate about why the used approaches did or did not identify the memory issues affecting this program.
String parsing
You know the drill!
Next program is bracket-parser.c, that extracts a "substring [within brackets]"
from a larger string.
Exercise 4.a: read the program, find the memory problem with it, try Clang-Tidy and LLVM sanitizers on it, explaining what you are seeing.
Exercise 4.b: on most inputs the LLVM sanitizers will not detect a memory issue.
Based on your understanding of the bug and your reading of the code (= white-box inspection), can you craft a specific input for bracket-parser
that will make the LLVM sanitizer spot the bug at runtime?
(And while we are at it: what does this tell you about the reliability of dynamic analysis to detect memory issues?)
Fuzzing with LibFuzzer
Let's now see if fuzzing can help us finding automatically both the memory issue affecting bracket-parser
and a test input that proves the issue is there.
Read the introductory documentation of LibFuzzer.
Make sure to understand how to add fuzz targets to your code, what the required API is, and the fact you should not have a main
function when using LibFuzzer (because it adds its own main
).
Exercise 4.c: add a fuzz target to your code, that allows you to fuzz the input of the parse
function of bracket-parser.c
.
Tip: you can take inspiration from the example seen in lecture, but remember that in this case you should pass well-formed C strings (trailing \0
!) to avoid triggering bugs other than the one you are looking for.
Exercise 4.d: build the fuzzer (adding fuzzer
to the list of -fsanitize=...
sanitizers) and run it.
Does it find the issue?
Does it produce a test case with a sample input that proves the issue is there?
Explain why.
(Pretty cool, huh?)
Lab 03: Rust and references
In this lab session you will build and run your first Rust program. You will manipulate references and objects lifetimes.
By the end of the session you will have a better understanding of the lifecycle of Rust objects.
Your first Rust program
In this section, you will learn how to create and compile your first Rust program using tools such as cargo
.
Anatomy of a Rust program
A basic Rust program is organized as follows:
Cargo.toml
Cargo.lock
src
└── main.rs
target
└── debug
└── myprogram
- The file
Cargo.toml
describes your program (name, version, authors, …). - The file
Cargo.lock
contains information about the version of the other packages used when compiling your program. src
contains the source of your program —main.rs
is the file containing themain
function.target
is a directory created by the compiler; it will contain the output of the compilation, including the executable produced from your source files, indebug/myprogram
(ifmyprogram
is the name of your program as described inCargo.toml
). Thetarget
directory should never be checked in your version control software. It should be ignored completely (e.g., by adding it to your.gitignore
if you are usinggit
).
Creating your first Rust program: intro
Exercise 0.a: Create the skeleton of your first Rust program by using the command cargo new
:
$ cargo new intro
Created binary (application) `intro` package
This will create a intro
directory with a complete skeleton of a Rust program:
$ tree intro
intro
├── Cargo.toml
└── src
└── main.rs
(you might not have the tree
on your computer – never mind, we will not need it)
Change your current directory to intro
and display the content of main.rs
:
$ cd intro
$ cat src/main.rs
fn main() {
println!("Hello, world!");
}
As you can see, the program is pretty basic. You can compile it by using cargo build
:
$ cargo build
Compiling intro v0.1.0 (/tmp/intro)
Finished dev [unoptimized + debuginfo] target(s) in 0.54s
Your program has compiled without errors. You can execute it by using cargo run
:
$ cargo run
Finished dev [unoptimized + debuginfo] target(s) in 0.00s
Running `target/debug/intro`
Hello, world!
Note that you can execute cargo run
without first executing cargo build
: the application will be built automatically.
Editing the Rust program
Exercise 0.b: Using your editor of choice (for example Visual Studio code or NetBeans), edit main.rs
so that the main program now displays a count from 1 to 5 inclusive:
fn main() { for i in 1..=5 { println!("i = {i}"); } }
Ensure that your editor is Rust-aware, or install the necessary extensions if needed. For Visual Studio Code, we suggest the installation of the "rust-analyzer" extension.
You can now execute your code. If everything went well, you should see the following output:
$ cargo run
Compiling intro v0.1.0 (/tmp/intro)
Finished dev [unoptimized + debuginfo] target(s) in 0.20s
Running `target/debug/intro`
i = 1
i = 2
i = 3
i = 4
i = 5
You can also compile your program and run the executable directly. As cargo run
output indicates it is located in target/debug/intro
:
$ cargo build
Finished dev [unoptimized + debuginfo] target(s) in 0.00s
$ ls -l target/debug/intro
-rwxr-xr-x 2 sam sam 4203904 Jan 16 15:02 target/debug/intro
$ target/debug/intro
i = 1
i = 2
i = 3
i = 4
i = 5
Using a Rust tutorial
There exists an interactive tutorial at https://rust-book.cs.brown.edu/. While reading this tutorial, you will be able to take quizzes and to let annotations that can be re-read later on.
You will not have to read the full tutorial at this time. We suggest that you ensure that you understand the concepts (and pass the quizzes) in the following chapters:
- 3.1 Variables and Mutability
- 3.3 Functions
- 4.1 What is Ownership?
- 4.2 References and Borrowing
- 4.3 The Slice Type
Of course, you are invited to read more chapters and take more quizzes as the lecturers describe new Rust concepts.
If you want to do more
Another way to familiarize yourself with Rust programming is to use the rustlings repository and program. Start by installing the rustings
program and clone the rustlings
repository containing the exercises:
$ cargo install rustlings
$ git clone https://github.com/rust-lang/rustlings/
$ cd rustlings
$ rustlings watch
The source for this exercise is in `exercises/intro/intro1.rs`. Have a look!
Going forward, the source of the exercises will always be in the success/failure output.
====================
You can keep working on this exercise,
or jump into the next one by removing the `I AM NOT DONE` comment
[…waits forever…]
The latest command rustlings watch
will watch your progress while editing the exercises in the exercises
directory of the repository.
For example, following the instructions given by rustlings watch
, edit the file exercises/intro/intro1.rs
with your favourite editor. As an exception, this exercise already compiles succesfully. All you have to do is remove the line // I AM NOT DONE
so that rustlings watch
will go to the next exercise in the series, exercises/intro/intro2.rs
.
This one will not compile. You have to edit the code so that it compiles successfully, then when you see that it does you can remove the // I AM NOT DONE
line, and so on.
You can stop doing the exercises whenever you want (just stop the rustlings watch
program). When you run rustlings watch
again it will start from where you left off.
References
Mutable references
The function std::mem::swap()
takes two mutable references on the same type as parameters and swap them.
Swapping values
Exercise 1.a: Complete the following function (and call it from main()
to see its effect) so that it has the expected effect:
#![allow(unused)] fn main() { fn do_swap() { let mut a = String::from("apple"); let mut b = String::from("bread"); println!("Before swap: a = {a}, b = {b}"); todo!(); // REPLACE ME with something using std::mem::swap() println!("After swap: a = {a}, b = {b}"); } }
The output should look like:
Before swap: a = apple, b = bread
After swap: a = bread, b = apple
You may realize that swapping two String
values do not move the various characters composing the strings in the heap. Only the three fields making up a String
(capacity, pointer, and length) are swapped.
Smallest first
Exercise 1.b: Write a function "smallest_first" with the following prototype which ensures that its first argument is smaller than its second argument and swap them otherwise:
fn smallest_first(a: &mut i32, b: &mut i32) { todo!(); // REPLACE ME } fn main() { let mut a = 42; let mut b = 10; println!("Before calling smallest_first: a = {a}, b = {b}"); smallest_first(&mut a, &mut b); println!("After calling smallest_first: a = {a}, b = {b}"); smallest_first(&mut a, &mut b); println!("After calling smallest_first again: a = {a}, b = {b}"); }
should display
Before calling smallest first: a = 42, b = 10
After calling smallest first: a = 10, b = 42
After calling smallest first again: a = 10, b = 42
Lifetimes
Sorting an immutable collection of strings
Let us assume that we want to write a function which sorts an immutable collection of String
elements. Here, "immutable" means that we cannot modify either the strings or the collection itself. We will have to return a Vec
containing the result.
Exercise 2.a: write a function sorted
which takes a reference to a slice of String
instances and return a Vec<String>
with the sorted strings.
The function will have the following signature:
#![allow(unused)] fn main() { fn sorted(names: &[String]) -> Vec<String> { todo!() // Write the code here } }
You can use the following constructs to help you to so:
vec![]
creates an emptyVec
(the type of the elements will be deduced from the context).- If
s
is aString
(or a&String
),s.clone()
will produce a newString
with the same content ass
. - You can iterate over the content of a slice
names
of type&[String]
withfor name in names { … }
.name
will have type&String
within the loop. - You can use
.sort()
on a mutable slice: it will sort your elements in the natural order.
You can test your function with the following main
program:
fn main() { let people = [String::from("John Doe"), String::from("Jane Eyre"), String::from("Jane Doe")]; let in_order = sorted(&people); for n in in_order { println!("- {n}"); } }
This should print "Jane Doe", "Jane Eyre", and "John Doe" in order.
Avoiding cloning the strings
While the previous program works, you had to clone the immutable strings. This is unfortunate as this consumes more memory. Why not work directly with a Vec
of &String
and sort it?
Exercise 2.b: modify your sorted
function so that it returns a Vec<&String>
instead of a Vec<String>
.
Note that .sort()
will work on a collection of &String
: it will sort them the same way it sorted String
elements.
Testing the guarantees of lifetimes
The signature of your sorted
function is:
#![allow(unused)] fn main() { fn sorted(names: &[String]) -> Vec<&String> }
Since you have not indicated any explicit lifetime (remember, lifetimes start with a single tick character '
), lifetimes elision kicked in: the lifetime of the &[String]
slice was copied over the &String
elements inside the Vec
.
It means that the result of sorted()
cannot live longer than its argument names
.
Exercise 2.c: experiment with the following modified main program, which will not compile, and make sure you understand why the compiler is telling you that you cannot store the output of sorted(&people)
into in_order
. Note how it prevents you from referencing strings that have been destroyed when people
went out of scope. You would not have had any such guarantee in C.
fn main() { let in_order = { let people = [String::from("John Doe"), String::from("Jane Eyre"), String::from("Jane Doe")]; sorted(&people) }; for n in in_order { println!("- {n}"); } }
Lifetimes: a complex case
Warning: in this page, we will touch several Rust areas that have not yet been explained in class. Do not hesitate to read the Rust documentation or to take things as granted for now.
Exercise 3.a: follow the code described below, implement it, make sure you understand how it works.
Problem statement
Sometimes, we would like to manipulate a string, for example to make it lowercase. However, the string we manipulate might already be in lowercase: in this situation, we would like to avoid copying the string and return a reference to the string we got as a parameter.
In short, we will need a type which can hold:
- either a proper
String
if we had to build it; - or a reference to another string (as a
&str
) if we could just reuse an existing one.
Building such a type
We will build a type, named StringOrRef
, which can either hold an owned String
or store a borrowed &str
reference.
Enumerated types
Rust has support for enumerated types with values, which means that we can write something like (incorrect for now):
#![allow(unused)] fn main() { pub enum StringOrRef { Owned(String), Borrowed(&str), } }
A StringOrRef
object will be either a StringOrRef::Owned
or a StringOrRef::Borrowed
variant. When such an object is destroyed, it will destroy its content:
- If it is a
StringOrRef::Owned
, it will destroy theString
it owned (and thus free the heap memory associated with theString
). - If it is a
StringOrRef::Borrowed
, it will destroy the&str
it owned, which does nothing since destroying a reference does not destroy the object it points to (as the reference does not own it).
Fixing the type
Our type will not compile because we haven't told the compiler how long the reference in StringOrRef::Borrowed
is supposed to be valid. Since we do not know that ourselves in advance, we have to make it a generic parameter of the type:
#![allow(unused)] fn main() { pub enum StringOrRef<'a> { Owned(String), Borrowed(&'a str), } }
We say that the type StringOrRef
is parameterized by the generic lifetime parameter 'a
.
We can create an owned value by using StringOrRef::Owned(String::from("Hello"))
, or make a reference to a string s
with StringOrRef::Borrowed(&s)
. In the former case, the lifetime 'a
can be anything since the StringOrRef
object owns the string. In the later case, 'a
will be set to the lifetime of the referenced string s
:
Important note: when a type is parameterized by a generic lifetime parameter, an object of this type can never live longer than this lifetime. For example, if s
has a lifetime 's
, StringOrRef::Borrowed(s)
is an object that cannot live longer than 's
. This is intuitively sound: since we store a reference to s
(wich has lifetime 's
) inside our StringOrRef
, the StringOrRef
cannot survive the disappearance of s
as that would leave us with a dangling pointer.
Exploring the type
Our type can be used through pattern-matching:
#![allow(unused)] fn main() { fn display(x: &StringOrRef<'_>) { // '_ means that the lifetime has no importance here match x { StringOrRef::Owned(s) => println!("owned string: {s}"), StringOrRef::Borrowed(s) => println!("borrowed string: {s}"), } } }
We can also write a function which returns a &str
from our object:
#![allow(unused)] fn main() { pub fn as_str<'a>(x: &'a StringOrRef<'_>) -> &'a str { match x { StringOrRef::Owned(s) => &s, StringOrRef::Borrowed(s) => s, } } }
Note how we didn't have to give the lifetime of the StringOrRef
generic parameter 'a
and used '_
which means "any lifetime": since the StringOrRef
reference has a lifetime of 'a
which is necessarily shorter or equal than the generic lifetime parameter (see the "Important note" above), we now that the returned reference is shorter than the one used as a generic parameter.
Implementing as_str()
as a method
Rather than using a standalone function, we can implement as_str()
as a method on StringOrRef
objects. Methods are implemented in a impl
block. In an impl
block, Self
designates the type itself. In methods parameters, self
in first position designates receiving the current object (it is a shortcut for self: Self
), &self
is a shortcut for self: &Self
and &mut self
is a shortcut for self: &mut Self
.
Let us rewrite as_str()
as a method:
#![allow(unused)] fn main() { impl StringOrRef<'_> { pub fn as_str(&self) -> &str { match self { StringOrRef::Owned(s) => &s, StringOrRef::Borrowed(s) => s, } } } }
You can note some interesting points about lifetimes:
- We used
<'_>
in ourimpl
block: our method defined in this block works with any generic lifetime parameter, as explained below. - We didn't explicitely write in the
as_str()
signature that the returned&str
has the same lifetime as&self
. This mecanism is called "lifetime elision": when a method has a&self
parameter, by default all outputs lifetime which are not explicit will have the same lifetime as&self
. This is a shortcut forpub fn as_str<'a>(&'a self) -> &'a str
.
Using or StringOrRef
type
We can now use our StringOrRef
type. For example, let us write a function which returns a lowercase version of a string, but allocates memory on the heap only when the string is not lowercase already:
#![allow(unused)] fn main() { // The lifetime of s will be copied into the generic lifetime parameter of StringOrRef. // Again, this is because of elision rules: if there is only one lifetime parameter in // the input, it will be copied into all non-explicit lifetime parameters in the output. pub fn to_lowercase(s: &str) -> StringOrRef<'_> { if s.chars().all(|c| c.is_lowercase()) { // All characters in the string are lowercase already, return a reference StringOrRef::Borrowed(s) } else { // We need to create a new String with a lowercase version StringOrRef::Owned(s.to_lowercase()) } } }
We can now use it in our main program and see that it works:
fn main() { let s1 = to_lowercase("HeLlO"); let s2 = to_lowercase("world"); println!("s1 = {}, s2 = {}", s1.as_str(), s2.as_str()); }
This will display "s1 = hello, s2 = world". Nothing indicates that "world" has not been copied. Let's enhance the program with the matches!
macro which can test if some expression matches a pattern, as in a match
expression:
fn variant(x: &StringOrRef<'_>) -> &'static str { if matches!(x, StringOrRef::Owned(_)) { "owned" } else { "borrowed" } } fn main() { let s1 = to_lowercase("HeLlO"); let s2 = to_lowercase("world"); println!("s1 = {}, s2 = {}", s1.as_str(), s2.as_str()); println!("s1 is {}, s2 is {}", variant(&s1), variant(&s2)); }
The output is now
s1 = hello, s2 = world
s1 is owned, s2 is borrowed
as expected. Neat eh?
Adding a destructor
When an StringOrRef
object is dropped (goes out of scope), it will get destroyed: the destructor for every field will be called (if any). For example, if it holds a StringOrRef::Owned
variant, the String
contained in this variant will be dropped and its destructor will be called, freeing memory on the heap.
We can visualize what happens by adding a destructor on StringOrRef
. It is done by implementing the Drop
trait:
#![allow(unused)] fn main() { impl Drop for StringOrRef<'_> { fn drop(&mut self) { print!( "Destroying the StringOrRef containing {} which is {}: ", self.as_str(), variant(self), ); if matches!(self, StringOrRef::Owned(_)) { println!("memory on the heap will be freed"); } else { // Dropping a reference doesn't free memory on the heap println!("no memory on the heap will be freed"); } } } }
If we execute our program, we will now read:
s1 = hello, s2 = world
s1 is owned, s2 is borrowed
Destroying the StringOrRef containing world which is borrowed: no memory on the heap will be freed
Destroying the StringOrRef containing hello which is owned: memory on the heap will be freed
s2
and s1
are destroyed in the reverse order of their creation when they go out of scope. No memory on the heap was ever allocated for string "world", which comes from a read-only memory area and has only been referenced. However, the string "hello" has been built into the heap while lowercasing the string "HeLlO" and needs to be freed: this
happens automatically when dropping s1
.
Conclusion
Types in Rust are powerful and allow easy memory management without needing a garbage collector. Doing the same thing in C would require extra fields (is the string owner or borrowed, code the deallocation by hand). We will later see even more powerful type manipulation in Rust.