Lifetimes: a complex case
Warning: in this page, we will touch several Rust areas that have not yet been explained in class. Do not hesitate to read the Rust documentation or to take things as granted for now.
Exercise 3.a: follow the code described below, implement it, make sure you understand how it works.
Problem statement
Sometimes, we would like to manipulate a string, for example to make it lowercase. However, the string we manipulate might already be in lowercase: in this situation, we would like to avoid copying the string and return a reference to the string we got as a parameter.
In short, we will need a type which can hold:
- either a proper
String
if we had to build it; - or a reference to another string (as a
&str
) if we could just reuse an existing one.
Building such a type
We will build a type, named StringOrRef
, which can either hold an owned String
or store a borrowed &str
reference.
Enumerated types
Rust has support for enumerated types with values, which means that we can write something like (incorrect for now):
#![allow(unused)] fn main() { pub enum StringOrRef { Owned(String), Borrowed(&str), } }
A StringOrRef
object will be either a StringOrRef::Owned
or a StringOrRef::Borrowed
variant. When such an object is destroyed, it will destroy its content:
- If it is a
StringOrRef::Owned
, it will destroy theString
it owned (and thus free the heap memory associated with theString
). - If it is a
StringOrRef::Borrowed
, it will destroy the&str
it owned, which does nothing since destroying a reference does not destroy the object it points to (as the reference does not own it).
Fixing the type
Our type will not compile because we haven't told the compiler how long the reference in StringOrRef::Borrowed
is supposed to be valid. Since we do not know that ourselves in advance, we have to make it a generic parameter of the type:
#![allow(unused)] fn main() { pub enum StringOrRef<'a> { Owned(String), Borrowed(&'a str), } }
We say that the type StringOrRef
is parameterized by the generic lifetime parameter 'a
.
We can create an owned value by using StringOrRef::Owned(String::from("Hello"))
, or make a reference to a string s
with StringOrRef::Borrowed(&s)
. In the former case, the lifetime 'a
can be anything since the StringOrRef
object owns the string. In the later case, 'a
will be set to the lifetime of the referenced string s
:
Important note: when a type is parameterized by a generic lifetime parameter, an object of this type can never live longer than this lifetime. For example, if s
has a lifetime 's
, StringOrRef::Borrowed(s)
is an object that cannot live longer than 's
. This is intuitively sound: since we store a reference to s
(wich has lifetime 's
) inside our StringOrRef
, the StringOrRef
cannot survive the disappearance of s
as that would leave us with a dangling pointer.
Exploring the type
Our type can be used through pattern-matching:
#![allow(unused)] fn main() { fn display(x: &StringOrRef<'_>) { // '_ means that the lifetime has no importance here match x { StringOrRef::Owned(s) => println!("owned string: {s}"), StringOrRef::Borrowed(s) => println!("borrowed string: {s}"), } } }
We can also write a function which returns a &str
from our object:
#![allow(unused)] fn main() { pub fn as_str<'a>(x: &'a StringOrRef<'_>) -> &'a str { match x { StringOrRef::Owned(s) => &s, StringOrRef::Borrowed(s) => s, } } }
Note how we didn't have to give the lifetime of the StringOrRef
generic parameter 'a
and used '_
which means "any lifetime": since the StringOrRef
reference has a lifetime of 'a
which is necessarily shorter or equal than the generic lifetime parameter (see the "Important note" above), we now that the returned reference is shorter than the one used as a generic parameter.
Implementing as_str()
as a method
Rather than using a standalone function, we can implement as_str()
as a method on StringOrRef
objects. Methods are implemented in a impl
block. In an impl
block, Self
designates the type itself. In methods parameters, self
in first position designates receiving the current object (it is a shortcut for self: Self
), &self
is a shortcut for self: &Self
and &mut self
is a shortcut for self: &mut Self
.
Let us rewrite as_str()
as a method:
#![allow(unused)] fn main() { impl StringOrRef<'_> { pub fn as_str(&self) -> &str { match self { StringOrRef::Owned(s) => &s, StringOrRef::Borrowed(s) => s, } } } }
You can note some interesting points about lifetimes:
- We used
<'_>
in ourimpl
block: our method defined in this block works with any generic lifetime parameter, as explained below. - We didn't explicitely write in the
as_str()
signature that the returned&str
has the same lifetime as&self
. This mecanism is called "lifetime elision": when a method has a&self
parameter, by default all outputs lifetime which are not explicit will have the same lifetime as&self
. This is a shortcut forpub fn as_str<'a>(&'a self) -> &'a str
.
Using or StringOrRef
type
We can now use our StringOrRef
type. For example, let us write a function which returns a lowercase version of a string, but allocates memory on the heap only when the string is not lowercase already:
#![allow(unused)] fn main() { // The lifetime of s will be copied into the generic lifetime parameter of StringOrRef. // Again, this is because of elision rules: if there is only one lifetime parameter in // the input, it will be copied into all non-explicit lifetime parameters in the output. pub fn to_lowercase(s: &str) -> StringOrRef<'_> { if s.chars().all(|c| c.is_lowercase()) { // All characters in the string are lowercase already, return a reference StringOrRef::Borrowed(s) } else { // We need to create a new String with a lowercase version StringOrRef::Owned(s.to_lowercase()) } } }
We can now use it in our main program and see that it works:
fn main() { let s1 = to_lowercase("HeLlO"); let s2 = to_lowercase("world"); println!("s1 = {}, s2 = {}", s1.as_str(), s2.as_str()); }
This will display "s1 = hello, s2 = world". Nothing indicates that "world" has not been copied. Let's enhance the program with the matches!
macro which can test if some expression matches a pattern, as in a match
expression:
fn variant(x: &StringOrRef<'_>) -> &'static str { if matches!(x, StringOrRef::Owned(_)) { "owned" } else { "borrowed" } } fn main() { let s1 = to_lowercase("HeLlO"); let s2 = to_lowercase("world"); println!("s1 = {}, s2 = {}", s1.as_str(), s2.as_str()); println!("s1 is {}, s2 is {}", variant(&s1), variant(&s2)); }
The output is now
s1 = hello, s2 = world
s1 is owned, s2 is borrowed
as expected. Neat eh?
Adding a destructor
When an StringOrRef
object is dropped (goes out of scope), it will get destroyed: the destructor for every field will be called (if any). For example, if it holds a StringOrRef::Owned
variant, the String
contained in this variant will be dropped and its destructor will be called, freeing memory on the heap.
We can visualize what happens by adding a destructor on StringOrRef
. It is done by implementing the Drop
trait:
#![allow(unused)] fn main() { impl Drop for StringOrRef<'_> { fn drop(&mut self) { print!( "Destroying the StringOrRef containing {} which is {}: ", self.as_str(), variant(self), ); if matches!(self, StringOrRef::Owned(_)) { println!("memory on the heap will be freed"); } else { // Dropping a reference doesn't free memory on the heap println!("no memory on the heap will be freed"); } } } }
If we execute our program, we will now read:
s1 = hello, s2 = world
s1 is owned, s2 is borrowed
Destroying the StringOrRef containing world which is borrowed: no memory on the heap will be freed
Destroying the StringOrRef containing hello which is owned: memory on the heap will be freed
s2
and s1
are destroyed in the reverse order of their creation when they go out of scope. No memory on the heap was ever allocated for string "world", which comes from a read-only memory area and has only been referenced. However, the string "hello" has been built into the heap while lowercasing the string "HeLlO" and needs to be freed: this
happens automatically when dropping s1
.
Conclusion
Types in Rust are powerful and allow easy memory management without needing a garbage collector. Doing the same thing in C would require extra fields (is the string owner or borrowed, code the deallocation by hand). We will later see even more powerful type manipulation in Rust.