Introduction

This website contains teaching material for the laboratory sessions of the course Safe System Programming (in Rust) (SSP-RS), at Télécom Paris / Polytechnique Institute of Paris.

ⓒ 2022-2024 Samuel Tardieu and Stefano Zacchiroli

Lab 01 — Unsafe System Programming

In the first lab session of the course, you will mainly work on:

  1. Setting up your development environment for the rest of the course, and
  2. Experimenting with unsafe system programming, leading to buffer overflows and other nasty stuff.

Let's get to it.

Development setup

For the duration of the entire lab sessions of this course you are going to use a number of tools, for the most part related to development in the C and Rust programming languages. As the first task for this lab session you are hence going to install the relevant toolchains and a comfortable (for you) development environment.

Operating system requirements: we are going to assume that your work machine runs Linux as operating system (OS). If that is not the case, you should fix this before proceeding. Any Linux-based distribution would do, but in the following we are going to rely on package names from Debian and Debian-based distributions (e.g., Ubuntu); you should translate them to your distro if need be.

Work environment

You have various options in terms of which "machine" to work on:

  1. work on your real (hardware) machine: laptop or desktop computer;
  2. work in a Docker container;
  3. work in a virtual machine.

Option 1: Work on your machine

We recommend this option as it is the most flexible, but it requires more manual setup on your side—setup which we are not going to detail here as it heavily depends on your OS choice.

We only list below the tools (and corresponding Debian package names) that you should install on your machine to be able to complete the exercises of this course.

Option 2: Work in a Docker container

If you would like to work in a Docker container, we provide a Dockerfile that you can use to create locally a Docker image that contains all the tools used for lab sessions. Download the Dockerfile locally and build a matching image like this:

$ docker build -t net7212 - < Dockerfile

(It will take a few minutes to complete, depending mostly on download and I/O speeds.)

Then, create a shell script named run.sh to easily create and run a fresh container on that image every time you need it. It can be something like the following:

#!/bin/bash
host_labdir="${HOME}/net7212-lab"
cont_labdir="/home/net7212/lab"
docker run -ti \
       --net host \
       -e DISPLAY="unix${DISPLAY}" \
       -v /dev/shm:/dev/shm \
       -v /etc/hosts:/etc/hosts \
       -v /etc/localtime:/etc/localtime:ro \
       -v /tmp/.X11-unix:/tmp/.X11-unix \
       -v /var/run/dbus/system_bus_socket:/var/run/dbus/system_bus_socket \
       -v $HOME/.Xauthority:/home/net7212/.Xauthority \
       -v "${host_labdir}:${cont_labdir}" \
       --rm \
       --name net7212 \
       net7212

note that host_labdir should point to a directory on your real machine, that will be mounted within the container so that you can share files between the two worlds. All your lab session material will need to be in this directory if you want to access it from within the container.

You can then enter and use a fresh container every time you need it like this:

./run.sh 
net7212@machine:~$ rustc --version
rustc 1.72.0 (5680fa18f 2023-08-23)

and that if you create a file in your home directory under ~/net7212-lab you can also access it from within the container, e.g.:

# outside the container
$ echo "// hello.c" > ~/net7212-lab/hello.c
./run.sh
net7212@machine:~$ cat ~/lab/hello.c
// hello.c
net7212@machine:~$ 

Option 3: Work in a virtual machine

If you prefer to fully control your work environment (like with option (1)), but not impact your day-to-day work machine, you have the option to install a virtual machine and configure it manually. Just install in there the same tools that you would have installed, following the instructions below, on your real machine. We are not going to detail this step further, as it heavily depends on your choice of virtualization technology.

C development toolchain

If you have chosen to work on your (virtual) machine, you should install the following tools:

ToolDebian package name
batsbats
checkcheck
clang-tidyclang-tidy
clangclang
gccgcc
gdbgdb
ltraceltrace
makemake
stracestrace
valgrindvalgrind

If you have chosen the Docker option, they are all already installed in the container.

Exercise 0.a: Install all the above dependencies in the work environment of your choice.
When ready, make sure you can compile and run the following Hello World program with both gcc and clang:

#include <stdio.h>

int main(void) {
	printf("Hello, World!\n");
	return 0;
} // end of hello.c

like this:

$ gcc -Wall hello.c 
$ ./a.out 
Hello, World!

$ clang -Wall hello.c 
$ ./a.out 
Hello, World!

Now make sure that all of the following commands execute properly without failing on your setup (the exact output might very depending on version, configuration, etc.; that's OK):

$ bats -v
Bats 1.8.2
$ clang-tidy --version
Debian LLVM version 14.0.6
  Optimized build.
  Default target: x86_64-pc-linux-gnu
  Host CPU: skylake
$ gdb --version
GNU gdb (Debian 12.1-4) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
$ ltrace --version
ltrace version 0.7.3.
Copyright (C) 1997-2009 Juan Cespedes <cespedes@debian.org>.
This is free software; see the GNU General Public Licence
version 2 or later for copying conditions.  There is NO warranty.
$ make --version
GNU Make 4.3
Built for x86_64-pc-linux-gnu
Copyright (C) 1988-2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
$ strace --version
strace -- version 5.10
Copyright (c) 1991-2020 The strace developers <https://strace.io>.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Optional features enabled: stack-trace=libunwind stack-demangle m32-mpers mx32-mpers
$ valgrind --version
valgrind-3.19.0

Rust development toolchain

If you have chosen to work on your (virtual) machine, you should install the following tools:

ToolDebian package
rustupnone (install by hand)

If you have chosen the Docker option, they are all already installed in the container.

Exercise 0.b: Install all the above dependencies in the work environment of your choice.
Once installed, verify that you can successfully run the following basic tools of the Rust toolchain, like this (your output might very, depending on the installed versions):

$ rustup --version
rustup 1.26.0 (5af9b9484 2023-04-05)
info: This is the version for the rustup toolchain manager, not the rustc compiler.
info: The currently active `rustc` version is `rustc 1.72.0 (5680fa18f 2023-08-23)`
$ cargo --version
cargo 1.72.0 (103a7ff2e 2023-08-15)
$ rustc --version
rustc 1.72.0 (5680fa18f 2023-08-23)

Editor / IDE

You are also going to need a development editor or an IDE (Integrated Development Environment). As developers tend to be opinionated about this stuff we are not going to force you to use any specific one: pick your own!

We recommend either a flexible editor with IDE-like features (e.g., Emacs or Vim with LSP support for Rust) or a full-featured IDE like Visual Studio Code or NetBeans (do make sure it has support for Rust, though, as that will be needed later in the semester).

Exercise 0.c: Install the editor or IDE you want in the work environment of your choice.
When done, open the hello.c file you used previously and make sure you can comfortably edit/compile/run it, ideally from within the editor/IDE itself.

Done?
Good! You are now all set with the developer setup!
Let's now move to the other matters of the day.

Memory unsafety

Out-of-bounds write

Exercise 1.a: write C programs (new, different from those seen in lecture 01) that exhibit weakness CWE 787: Out-of-bounds write. Your goal is to make your program segfault due to this CWE! Try out-of-bound writes in various memory segments, writing one program for each of the following cases:

  • out-of-bound write in dynamically allocated memory on the heap (e.g., with malloc),
  • out-of-bound write in statically allocated memory (i.e., global variable, preinitialized or not),
  • out-of-bound write in automatic memory (i.e., variables allocated on the stack).

Bonus point: make sure your programs do not emit any warnings when compiled invoking the compiler with the --Wall flag (which asks to enable all compile-time warnings and that you should always use anyway!).

Try to determine experimentally what each program was doing just before the segfault (tip: try with gdb, nm, ltrace, strace).
What helped you determine this?

Exercise 1.b: consider the following local variable declarations at the beginning of a function (possibly main):

char s1[16];
char s2[16];

write a program that does not segfault, but that with a single memory (or string) copy operation that writes to only one of these variables (s1, s2) modifies the content of both variables, with data of your choice. Verify the result experimentally (e.g., with printf or strcmp).

Warning: do not forget about string \0 terminators!

Out-of-bounds read

Exercise 1.c: continuing from the idea from the previous exercise, write a program that exhibits at least one variant of CWE 125: Out-of-bounds read. The program should not segfault, but by performing a memory ready operation on a given variable, it should be able to read the content of another variable, correctly and in full.

Bonus point: make your program read the content of an out-of-scope local variable, e.g., one declared in a calling function.

Smashing the stack

Segmentation fault

Considering the following program:

void function(int a, int b, int c) {
    char buffer1[5];
    char buffer2[10];
}
int main() {
    function(1, 2, 3);
    return 0;
}

Exercise 2.a: compile the program to assembly language like this:

$ gcc -S -o example1.s example1.c

and identify in the generated assembly code where function is called. How are arguments passed to the function? (The answer might vary depending on the compiler and architecture, but the assembly code will tell you!)

Exercise 2.b: compile, execute, and debug the following program:

#include <string.h>

void function(char *str) {
    char buffer[16];
    strcpy(buffer, str);
}

int main() {
    char large_string[256];
    int i;

    for(i = 0; i < 255; i++) {
        large_string[i] = 'A';
    }
    function(large_string);

    return 0;
}

Explain what's happening and why.

Subverting the control flow

Historically (on 32 bit architectures and with less built-in defenses in C compilers) subverting the control flow via stack overflows was pretty easy. You can get an idea of how easy it was by reading the famous tutorial article Smashing The Stack For Fun And Profit by Aleph One (1996).

Nowadays it has become a little bit more complicated, but still very doable (and regularly done!, as we have seen in the course examples).

Exercise 2.c: go through the modern incarnation of Aleph One's tutorial, namely: Smashing the Stack in the 21st Century by Jon Gjengset (archived copy), reproducing the reported experiences on your machine. (Next week we will go through some of the defenses that you will have to manually disable to exploit a stack overflow, and discuss what they are good for and what they are not.)

Lab 02 — Hardening System Programming

In this lab session we will take a practical tour of existing techniques to secure system programs implemented in traditional system languages. We will experiment with: testing, fuzzing, code instrumentation (of both binary and source code), and static analysis.

By the end of the session you will have a better understanding of the state-of-the-art of hardening system programming, including its limitations.

Credits

The exercises of this lab session contain material and ideas reused with permission from week 1 assignments of Stanford's course CS 110L (2022) by Ryan Eberhardt, Armin Namavari, Will Crichton, Julio Ballista, and Thea Rossman.

Unit testing

Setup

Consider the initial C implementation of a simple currency type (a pair of a currency name and a currency amount) given in money.h and money.c.

We want to reassure ourselves that the code thus far is correct, using unit testing.

In check_money_tiny.c you can find the skeleton of a C program that uses money.c as a library and unit tests it with a single test case. It uses the Check unit testing framework for C, which you should have already installed as part of the first lab session. It can be built and run like this:

$ gcc -Wall -c -o money.o money.c
$ gcc -Wall -c -o check_money_tiny.o check_money_tiny.c
$ gcc -Wall -o check_money_tiny check_money_tiny.o money.o `pkg-config --libs check`
$ ./check_money_tiny 
Running suite(s): Money
100%: Checks: 1, Failures: 0, Errors: 0
check_money_tiny.c:17:P:Core:test_money_create_amount:0: Passed

Exercise 1.a: read through the implementation of money.c and check_money_tiny.c to make sure you understand what the code does. Help yourself with the Check documentation as needed.

Code coverage

The notion of code coverage is a measure of how much of your implementation code is exercised during the execution of your test suite. It can be measured in various ways, we will focus on the simplest metric: how many lines of code are (not) executed during test suite running (i.e., line coverage). You can verify the line coverage of your current test suite using gcov, which is part of gcc by:

  1. compile/link your project passing the -fprofile-arcs -ftest-coverage options to compiler and linker,
  2. execute your test suite (./check_money_tiny in this case),
  3. inspect code coverage of a source code file you care about, e.g.:
$ gcov money.c 
File 'money.c'
Lines executed:75.00% of 16
Creating 'money.c.gcov'

Lines executed:75.00% of 16

The summary tells you the percentage of lines of code exercised during execution. The mentioned file money.c.gcov is an annotated version of money.c that shows precisely which lines have been executed (or not).

Exercise 1.b: measure the code coverage of the current test suite for money.c. Then, add all the test cases you need to maximize code coverage. Can you reach 100% line coverage? Why or why not?

Exercise 1.c: (optional and tricky, feel free to postpone this one to the end of the session) add a memory unsafety issue to the money.c implementation, and specifically one that crashes the execution of your test case. For example, you can implement a new money_add method that performs an out-of-bound read or write. Then add a test that run the buggy code, crashing the execution of the test case. Does the test case crash cause the crash of the entire test suite? (i.e., do other test cases supposed to run after it keep running or not?) Why or why not?

Stack overflow detection

Setup

Retrieve the program uppercase.c, compile it (recommended compiler options for this session are: -g -O0 -Wall -Wextra), and run it, e.g., like this:

$ gcc -g -O0 -Wall -Wextra -o uppercase uppercase.c
$ ./uppercase 'Hello, World!'
HELLO, WORLD!

Now take some time to study the code of the program.

Exercise 2.a: the program is affected by a memory-related bug (EEEK!). Explain where the bug is, what its consequences might be, and how to fix it.

Static analysis with Clang-Tidy

Clang-Tidy (which you have installed already, as part of last week's lab session) performs some simple static analysis checks, ranging from basic linting to more advanced techniques. From the tool documentation:

clang-tidy is a clang-based C++ “linter” tool. Its purpose is to provide an extensible framework for diagnosing and fixing typical programming errors, like style violations, interface misuse, or bugs that can be deduced via static analysis.

Exercise 2.b: go (briefly) through the documentation of clang-tidy and learn how to use it to analyze C source code files. Then use clang-tidy on uppercase.c to check if it detects the bug you have identified in the previous exercise.
Does clang-tidy catch the problem? Based on what you have learned in last lecture, explain why clang-tidy is capable to identify the problem or why it isn't.

Dynamic analysis with Valgrind

Clang-Tidy performs static analysis, Valgrind on the other hand performs dynamic analysis, based on dynamic binary re-compilation, as we have seen in last lecture. Let's see how Valgrind fares in detecting the bug with this program. After compiling your program, we can give it a try like this:

$ valgrind [--tool=memcheck] ./uppercase 'Hello, World!'

(Note: --tool=memcheck is optional because Memcheck is the default tool that Valgrind uses unless otherwise requested.)

Exercise 2.c: does Valgrind detect the issue? Based on what you have learned in last lecture, explain why or why not.

Exercise 2.d: benchmark the time required to execute the program with and without Valgrind. (Tip: as a simple way to do that you can use the standard UNIX time command. If you want to be fancier, this is your chance to learn about the amazing benchmarking tool Hyperfine and use it for a more scientifically accurate comparison.)

Dynamic analysis with LLVM sanitizers

As we have seen in last lecture, dynamic analysis can also be performed via source code instrumentation, which happens at compile time to add runtime checks. As we have seen LLVM sanitizers (look for the various entries with "Sanitizer" in their name in the Clang documentation) are a set of plugins for the Clang compiler that do that.

Exercise 2.e: change the compiler you are using to build from gcc to clang and first rebuild the program with it. (You can pass the same options suggested above, as they are compatible between the two compilers.) Then, add all of the following sanitizers (check the documentation for how to do that) and rebuild the program: AddressSanitizer, LeakSanitizer, UndefinedBehaviorSanitizer.
Compare the size of the executables built with and without sanitizers. How different are they and why?

Exercise 2.f: run the instrumented program. Do sanitizers detect the issue? Based on what you have learned in last lecture, explain why or why not.

Let's streamline our build process a bit, as it is becoming a bit of a mess. Retrieve the generic Makefile we make available for building the programs associated to this lab session. If you are not familiar with Makefiles, go (briefly) through the GNU make documentation to make sure you know the basics. In particular, read through the introductory material and make sure you understand the parts of Makefile syntax used in the given Makefile, in particular: rules, variables, and static pattern rules. (If you know about Makefiles already, feel free to skip this documentation study part.)

Exercise 2.g: change the retrieved Makefile so that:

  1. it only builds the uppercase.c programs (other programs mentioned in there will arrive later in this lab session!), and
  2. passes the sanitizer flags to the compiler when compiling.

Recheck the result of the previous exercise about sanitizers to make sure they are there when building with make.

Memory leak detection

Linked lists

Now retrieve the program linkedlist.c, add it to the Makefile, and compile it as before.

Take some time to study the program and understand what it does.

Exercise 3.a: this program is affected by (at least) two memory-related bugs this time, but of a different nature than before. Identify both of them, explain what the consequences might be, and how to fix them.

Exercise 3.b: try linting/static analysis with Clang-Tidy on this program, as you did before.
Does it detect the (right) issues? (Tip: beware of false positives!)

Exercise 3.c: try source code instrumentation with LLVM sanitizers on this program, as you did before.
Do they detect the issues?

In both cases feel free to speculate about why the used approaches did or did not identify the memory issues affecting this program.

String parsing

You know the drill!

Next program is bracket-parser.c, that extracts a "substring [within brackets]" from a larger string.

Exercise 4.a: read the program, find the memory problem with it, try Clang-Tidy and LLVM sanitizers on it, explaining what you are seeing.

Exercise 4.b: on most inputs the LLVM sanitizers will not detect a memory issue. Based on your understanding of the bug and your reading of the code (= white-box inspection), can you craft a specific input for bracket-parser that will make the LLVM sanitizer spot the bug at runtime?
(And while we are at it: what does this tell you about the reliability of dynamic analysis to detect memory issues?)

Fuzzing with LibFuzzer

Let's now see if fuzzing can help us finding automatically both the memory issue affecting bracket-parser and a test input that proves the issue is there.

Read the introductory documentation of LibFuzzer. Make sure to understand how to add fuzz targets to your code, what the required API is, and the fact you should not have a main function when using LibFuzzer (because it adds its own main).

Exercise 4.c: add a fuzz target to your code, that allows you to fuzz the input of the parse function of bracket-parser.c. Tip: you can take inspiration from the example seen in lecture, but remember that in this case you should pass well-formed C strings (trailing \0!) to avoid triggering bugs other than the one you are looking for.

Exercise 4.d: build the fuzzer (adding fuzzer to the list of -fsanitize=... sanitizers) and run it.
Does it find the issue? Does it produce a test case with a sample input that proves the issue is there? Explain why.

(Pretty cool, huh?)

Lab 03: Rust and references

In this lab session you will build and run your first Rust program. You will manipulate references and objects lifetimes.

By the end of the session you will have a better understanding of the lifecycle of Rust objects.

Your first Rust program

In this section, you will learn how to create and compile your first Rust program using tools such as cargo.

Anatomy of a Rust program

A basic Rust program is organized as follows:

Cargo.toml
Cargo.lock
src
└── main.rs
target
└── debug
    └── myprogram
  • The file Cargo.toml describes your program (name, version, authors, …).
  • The file Cargo.lock contains information about the version of the other packages used when compiling your program.
  • src contains the source of your program — main.rs is the file containing the main function.
  • target is a directory created by the compiler; it will contain the output of the compilation, including the executable produced from your source files, in debug/myprogram (if myprogram is the name of your program as described in Cargo.toml). The target directory should never be checked in your version control software. It should be ignored completely (e.g., by adding it to your .gitignore if you are using git).

Creating your first Rust program: intro

Exercise 0.a: Create the skeleton of your first Rust program by using the command cargo new:

$ cargo new intro
     Created binary (application) `intro` package

This will create a intro directory with a complete skeleton of a Rust program:

$ tree intro
intro
├── Cargo.toml
└── src
    └── main.rs

(you might not have the tree on your computer – never mind, we will not need it)

Change your current directory to intro and display the content of main.rs:

$ cd intro
$ cat src/main.rs
fn main() {
    println!("Hello, world!");
}

As you can see, the program is pretty basic. You can compile it by using cargo build:

$ cargo build
   Compiling intro v0.1.0 (/tmp/intro)
    Finished dev [unoptimized + debuginfo] target(s) in 0.54s

Your program has compiled without errors. You can execute it by using cargo run:

$ cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/debug/intro`
Hello, world!

Note that you can execute cargo run without first executing cargo build: the application will be built automatically.

Editing the Rust program

Exercise 0.b: Using your editor of choice (for example Visual Studio code or NetBeans), edit main.rs so that the main program now displays a count from 1 to 5 inclusive:

fn main() {
    for i in 1..=5 {
        println!("i = {i}");
    }
}

Ensure that your editor is Rust-aware, or install the necessary extensions if needed. For Visual Studio Code, we suggest the installation of the "rust-analyzer" extension.

You can now execute your code. If everything went well, you should see the following output:

$ cargo run
   Compiling intro v0.1.0 (/tmp/intro)
    Finished dev [unoptimized + debuginfo] target(s) in 0.20s
     Running `target/debug/intro`
i = 1
i = 2
i = 3
i = 4
i = 5

You can also compile your program and run the executable directly. As cargo run output indicates it is located in target/debug/intro:

$ cargo build
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
$ ls -l target/debug/intro
-rwxr-xr-x 2 sam sam 4203904 Jan 16 15:02 target/debug/intro
$ target/debug/intro
i = 1
i = 2
i = 3
i = 4
i = 5

Using a Rust tutorial

There exists an interactive tutorial at https://rust-book.cs.brown.edu/. While reading this tutorial, you will be able to take quizzes and to let annotations that can be re-read later on.

You will not have to read the full tutorial at this time. We suggest that you ensure that you understand the concepts (and pass the quizzes) in the following chapters:

Of course, you are invited to read more chapters and take more quizzes as the lecturers describe new Rust concepts.

If you want to do more

Another way to familiarize yourself with Rust programming is to use the rustlings repository and program. Start by installing the rustings program and clone the rustlings repository containing the exercises:

$ cargo install rustlings
$ git clone https://github.com/rust-lang/rustlings/
$ cd rustlings
$ rustlings watch
The source for this exercise is in `exercises/intro/intro1.rs`. Have a look!
Going forward, the source of the exercises will always be in the success/failure output.

====================

You can keep working on this exercise,
or jump into the next one by removing the `I AM NOT DONE` comment
[…waits forever…]

The latest command rustlings watch will watch your progress while editing the exercises in the exercises directory of the repository.

For example, following the instructions given by rustlings watch, edit the file exercises/intro/intro1.rs with your favourite editor. As an exception, this exercise already compiles succesfully. All you have to do is remove the line // I AM NOT DONE so that rustlings watch will go to the next exercise in the series, exercises/intro/intro2.rs.

This one will not compile. You have to edit the code so that it compiles successfully, then when you see that it does you can remove the // I AM NOT DONE line, and so on.

You can stop doing the exercises whenever you want (just stop the rustlings watch program). When you run rustlings watch again it will start from where you left off.

References

Mutable references

The function std::mem::swap() takes two mutable references on the same type as parameters and swap them.

Swapping values

Exercise 1.a: Complete the following function (and call it from main() to see its effect) so that it has the expected effect:

#![allow(unused)]
fn main() {
fn do_swap() {
    let mut a = String::from("apple");
    let mut b = String::from("bread");
    println!("Before swap: a = {a}, b = {b}");
    todo!(); // REPLACE ME with something using std::mem::swap()
    println!("After swap: a = {a}, b = {b}");
}
}

The output should look like:

Before swap: a = apple, b = bread
After swap: a = bread, b = apple

You may realize that swapping two String values do not move the various characters composing the strings in the heap. Only the three fields making up a String (capacity, pointer, and length) are swapped.

Smallest first

Exercise 1.b: Write a function "smallest_first" with the following prototype which ensures that its first argument is smaller than its second argument and swap them otherwise:

fn smallest_first(a: &mut i32, b: &mut i32) {
    todo!(); // REPLACE ME
}

fn main() {
    let mut a = 42;
    let mut b = 10;
    println!("Before calling smallest_first: a = {a}, b = {b}");
    smallest_first(&mut a, &mut b);
    println!("After calling smallest_first: a = {a}, b = {b}");
    smallest_first(&mut a, &mut b);
    println!("After calling smallest_first again: a = {a}, b = {b}");
}

should display

Before calling smallest first: a = 42, b = 10
After calling smallest first: a = 10, b = 42
After calling smallest first again: a = 10, b = 42

Lifetimes

Sorting an immutable collection of strings

Let us assume that we want to write a function which sorts an immutable collection of String elements. Here, "immutable" means that we cannot modify either the strings or the collection itself. We will have to return a Vec containing the result.

Exercise 2.a: write a function sorted which takes a reference to a slice of String instances and return a Vec<String> with the sorted strings.

The function will have the following signature:

#![allow(unused)]
fn main() {
fn sorted(names: &[String]) -> Vec<String> {
    todo!() // Write the code here
}
}

You can use the following constructs to help you to so:

  • vec![] creates an empty Vec (the type of the elements will be deduced from the context).
  • If s is a String (or a &String), s.clone() will produce a new String with the same content as s.
  • You can iterate over the content of a slice names of type &[String] with for name in names { … }. name will have type &String within the loop.
  • You can use .sort() on a mutable slice: it will sort your elements in the natural order.

You can test your function with the following main program:

fn main() {
    let people = [String::from("John Doe"), String::from("Jane Eyre"), String::from("Jane Doe")];
    let in_order = sorted(&people);
    for n in in_order {
        println!("- {n}");
    }
}

This should print "Jane Doe", "Jane Eyre", and "John Doe" in order.

Avoiding cloning the strings

While the previous program works, you had to clone the immutable strings. This is unfortunate as this consumes more memory. Why not work directly with a Vec of &String and sort it?

Exercise 2.b: modify your sorted function so that it returns a Vec<&String> instead of a Vec<String>.

Note that .sort() will work on a collection of &String: it will sort them the same way it sorted String elements.

Testing the guarantees of lifetimes

The signature of your sorted function is:

#![allow(unused)]
fn main() {
fn sorted(names: &[String]) -> Vec<&String>
}

Since you have not indicated any explicit lifetime (remember, lifetimes start with a single tick character '), lifetimes elision kicked in: the lifetime of the &[String] slice was copied over the &String elements inside the Vec.

It means that the result of sorted() cannot live longer than its argument names.

Exercise 2.c: experiment with the following modified main program, which will not compile, and make sure you understand why the compiler is telling you that you cannot store the output of sorted(&people) into in_order. Note how it prevents you from referencing strings that have been destroyed when people went out of scope. You would not have had any such guarantee in C.

fn main() {
    let in_order = {
        let people = [String::from("John Doe"), String::from("Jane Eyre"), String::from("Jane Doe")];
        sorted(&people)
    };
    for n in in_order {
        println!("- {n}");
    }
}

Lifetimes: a complex case

Warning: in this page, we will touch several Rust areas that have not yet been explained in class. Do not hesitate to read the Rust documentation or to take things as granted for now.

Exercise 3.a: follow the code described below, implement it, make sure you understand how it works.

Problem statement

Sometimes, we would like to manipulate a string, for example to make it lowercase. However, the string we manipulate might already be in lowercase: in this situation, we would like to avoid copying the string and return a reference to the string we got as a parameter.

In short, we will need a type which can hold:

  • either a proper String if we had to build it;
  • or a reference to another string (as a &str) if we could just reuse an existing one.

Building such a type

We will build a type, named StringOrRef, which can either hold an owned String or store a borrowed &str reference.

Enumerated types

Rust has support for enumerated types with values, which means that we can write something like (incorrect for now):

#![allow(unused)]
fn main() {
pub enum StringOrRef {
    Owned(String),
    Borrowed(&str),
}
}

A StringOrRef object will be either a StringOrRef::Owned or a StringOrRef::Borrowed variant. When such an object is destroyed, it will destroy its content:

  • If it is a StringOrRef::Owned, it will destroy the String it owned (and thus free the heap memory associated with the String).
  • If it is a StringOrRef::Borrowed, it will destroy the &str it owned, which does nothing since destroying a reference does not destroy the object it points to (as the reference does not own it).

Fixing the type

Our type will not compile because we haven't told the compiler how long the reference in StringOrRef::Borrowed is supposed to be valid. Since we do not know that ourselves in advance, we have to make it a generic parameter of the type:

#![allow(unused)]
fn main() {
pub enum StringOrRef<'a> {
    Owned(String),
    Borrowed(&'a str),
}
}

We say that the type StringOrRef is parameterized by the generic lifetime parameter 'a.

We can create an owned value by using StringOrRef::Owned(String::from("Hello")), or make a reference to a string s with StringOrRef::Borrowed(&s). In the former case, the lifetime 'a can be anything since the StringOrRef object owns the string. In the later case, 'a will be set to the lifetime of the referenced string s:

Important note: when a type is parameterized by a generic lifetime parameter, an object of this type can never live longer than this lifetime. For example, if s has a lifetime 's, StringOrRef::Borrowed(s) is an object that cannot live longer than 's. This is intuitively sound: since we store a reference to s (wich has lifetime 's) inside our StringOrRef, the StringOrRef cannot survive the disappearance of s as that would leave us with a dangling pointer.

Exploring the type

Our type can be used through pattern-matching:

#![allow(unused)]
fn main() {
fn display(x: &StringOrRef<'_>) {  // '_ means that the lifetime has no importance here
    match x {
        StringOrRef::Owned(s) => println!("owned string: {s}"),
        StringOrRef::Borrowed(s) => println!("borrowed string: {s}"),
    }
}
}

We can also write a function which returns a &str from our object:

#![allow(unused)]
fn main() {
pub fn as_str<'a>(x: &'a StringOrRef<'_>) -> &'a str {
    match x {
        StringOrRef::Owned(s) => &s,
        StringOrRef::Borrowed(s) => s,
    }
}
}

Note how we didn't have to give the lifetime of the StringOrRef generic parameter 'a and used '_ which means "any lifetime": since the StringOrRef reference has a lifetime of 'a which is necessarily shorter or equal than the generic lifetime parameter (see the "Important note" above), we now that the returned reference is shorter than the one used as a generic parameter.

Implementing as_str() as a method

Rather than using a standalone function, we can implement as_str() as a method on StringOrRef objects. Methods are implemented in a impl block. In an impl block, Self designates the type itself. In methods parameters, self in first position designates receiving the current object (it is a shortcut for self: Self), &self is a shortcut for self: &Self and &mut self is a shortcut for self: &mut Self.

Let us rewrite as_str() as a method:

#![allow(unused)]
fn main() {
impl StringOrRef<'_> {
    pub fn as_str(&self) -> &str {
        match self {
            StringOrRef::Owned(s) => &s,
            StringOrRef::Borrowed(s) => s,
        }
    }
}
}

You can note some interesting points about lifetimes:

  • We used <'_> in our impl block: our method defined in this block works with any generic lifetime parameter, as explained below.
  • We didn't explicitely write in the as_str() signature that the returned &str has the same lifetime as &self. This mecanism is called "lifetime elision": when a method has a &self parameter, by default all outputs lifetime which are not explicit will have the same lifetime as &self. This is a shortcut for pub fn as_str<'a>(&'a self) -> &'a str.

Using or StringOrRef type

We can now use our StringOrRef type. For example, let us write a function which returns a lowercase version of a string, but allocates memory on the heap only when the string is not lowercase already:

#![allow(unused)]
fn main() {
// The lifetime of s will be copied into the generic lifetime parameter of StringOrRef.
// Again, this is because of elision rules: if there is only one lifetime parameter in
// the input, it will be copied into all non-explicit lifetime parameters in the output.
pub fn to_lowercase(s: &str) -> StringOrRef<'_> {
    if s.chars().all(|c| c.is_lowercase()) {
        // All characters in the string are lowercase already, return a reference
        StringOrRef::Borrowed(s)
    } else {
        // We need to create a new String with a lowercase version
        StringOrRef::Owned(s.to_lowercase())
    }
}
}

We can now use it in our main program and see that it works:

fn main() {
    let s1 = to_lowercase("HeLlO");
    let s2 = to_lowercase("world");
    println!("s1 = {}, s2 = {}", s1.as_str(), s2.as_str());
}

This will display "s1 = hello, s2 = world". Nothing indicates that "world" has not been copied. Let's enhance the program with the matches! macro which can test if some expression matches a pattern, as in a match expression:

fn variant(x: &StringOrRef<'_>) -> &'static str {
    if matches!(x, StringOrRef::Owned(_)) {
        "owned"
    } else {
        "borrowed"
    }
}

fn main() {
    let s1 = to_lowercase("HeLlO");
    let s2 = to_lowercase("world");
    println!("s1 = {}, s2 = {}", s1.as_str(), s2.as_str());
    println!("s1 is {}, s2 is {}", variant(&s1), variant(&s2));
}

The output is now

s1 = hello, s2 = world
s1 is owned, s2 is borrowed

as expected. Neat eh?

Adding a destructor

When an StringOrRef object is dropped (goes out of scope), it will get destroyed: the destructor for every field will be called (if any). For example, if it holds a StringOrRef::Owned variant, the String contained in this variant will be dropped and its destructor will be called, freeing memory on the heap.

We can visualize what happens by adding a destructor on StringOrRef. It is done by implementing the Drop trait:

#![allow(unused)]
fn main() {
impl Drop for StringOrRef<'_> {
    fn drop(&mut self) {
        print!(
            "Destroying the StringOrRef containing {} which is {}: ",
            self.as_str(),
            variant(self),
        );
        if matches!(self, StringOrRef::Owned(_)) {
            println!("memory on the heap will be freed");
        } else {
            // Dropping a reference doesn't free memory on the heap
            println!("no memory on the heap will be freed");
        }
    }
}
}

If we execute our program, we will now read:

s1 = hello, s2 = world
s1 is owned, s2 is borrowed
Destroying the StringOrRef containing world which is borrowed: no memory on the heap will be freed
Destroying the StringOrRef containing hello which is owned: memory on the heap will be freed

s2 and s1 are destroyed in the reverse order of their creation when they go out of scope. No memory on the heap was ever allocated for string "world", which comes from a read-only memory area and has only been referenced. However, the string "hello" has been built into the heap while lowercasing the string "HeLlO" and needs to be freed: this happens automatically when dropping s1.

Conclusion

Types in Rust are powerful and allow easy memory management without needing a garbage collector. Doing the same thing in C would require extra fields (is the string owner or borrowed, code the deallocation by hand). We will later see even more powerful type manipulation in Rust.