LifetimeKata
Welcome to LifetimeKata, a set of exercises which you can use to improve your understanding of lifetimes in Rust. While many tasks involve writing compiling code, some will also involve creating specific errors.
You should complete the kata in order, as they increase in difficulty, and depend on previous kata.
Getting Started
Clone this repository:
$ git clone https://www.github.com/tfpk/lifetimekata/
Most exercises are run in two steps:
$ cargo build --package ex04
And then either:
$ cargo test --package ex04
or:
$ cargo run --package ex04
depending on whether it's a binary or a library.
Chapter 0: References and Lifetimes Review
(This section is review, and may be familiar to some readers. If you already know what a reference is, you can skip it.)
If you've been using Rust for a while, you've likely already encountered
references. For every type T
, there are two types of references to it:
&T
: A shared reference (often called a shared borrow) ofT
. You can have as many of these as you'd like, but they do not allow you to modify the data they refer to.&mut T
: A mutable reference (often called an exclusive borrow) ofT
. You can only have one of these at a time, but they allow you to modify the data they refer to.
References make it easy to call a function on data without needing to copy that data.
The powerful thing about Rust's references is that they are guaranteed to always refer to something that still exists (i.e. has not been dropped/freed/gone out of scope). A reference to something that no longer exists is called a "dangling reference", and Rust guarantees you will never have one. Therefore, this example will not compile:
fn main() {
let x_ref = {
let x = 3;
&x
};
// x_ref would now refer to `x`, but `x` is out of scope, so x_ref is dangling.
println!("{}", x_ref)
}
Most modern languages (Python, Java, etc.) avoid the problem of dangling references by constantly checking at runtime whether you have any references to something, and dropping only when you have no references left. This is called "garbage collection", and the advantage of it is that you never need to think about when objects dropped. The language just does it for you. The disadvantage is performance -- garbage collection requires stopping your program occasionally for the language to scan every reference you have.
Some languages (notably C and Assembly) give you access to a "pointer" type. Since pointers are raw addresses in memory, the compiler leaves it to the programmer to ensure that they do not have dangling references. This allows them to be used in memory constrained or performance critical environments, but unfortunately means that a bug can access memory after it's destroyed; resulting in a crash, or worse, a security issue.
Rust is powerful because it gives you the convenience of knowing at run-time that you will never access freed memory; but the price you pay for this is the compiler needing to be convinced at compile-time that you've correctly used references.
An Example of an Unconvinced Compiler
You've undoubtedly come across errors like the one below before:
fn main() {
let mut my_reference: Option<&String> = None;
// Starting a scope.
{
// my_variable created // \ \
let my_variable: String = "hello".to_string(); // | |
my_reference = Some(&my_variable); // | |- my_variable exists here. ('variable)
// At the end of the scope, `my_variable` is dropped // | |
drop(my_variable); // | |
// my variable destroyed // | /
} // | - my_reference needs to exist here. ('reference)
// |
if let Some(reference) = my_reference { // |
println!("{}", reference); // |
} // /
}
error[E0597]: `my_variable` does not live long enough
--> bad_lifetimes.rs:7:29
|
7 | my_reference = Some(&my_variable);
| ^^^^^^^^^^^^ borrowed value does not live long enough
8 | }
| - `my_variable` dropped here while still borrowed
9 |
10 | if let Some(reference) = my_reference {
| ------------ borrow later used here
error: aborting due to previous error; 1 warning emitted
Clearly in this example, since my_variable
goes out of scope before my_reference
,
it's possible that the if let
could try and access my_reference
, and find that
it's referencing a variable that no longer exists.
Rust says that this variable "does not live long enough". It notices that
"it's possible that my_variable
is dropped before the reference stored in my_reference
".
Formally, we can see this by noticing the regions of code where these two things need to exist. The region of code where the reference needs to exist is larger than the region of code where the variable exists. This indicates that there must be part of the time the reference exists where the variable has been dropped, and therefore a dangling reference might exist.
We call a region of code where a reference must be valid a "lifetime". We can
give lifetimes names using the syntax 'name
. So let us say that 'variable
is the region of code where a reference to the variable is valid.
Also, let's say that 'reference
is the region of code where the reference
could be used. We can formally say that 'variable
must be larger than 'reference
.
This is obviously true, it is a shorthand for saying "the region of code where the reference is valid must be larger than the region of code where the reference is actually usable". Consider the opposite: if the reference was usable somewhere where the reference wasn't valid, you'd have something that was invalid: unsound code, or in other words, a bug.
So what's this book about then?
There are places where the Rust compiler is unable to figure out lifetimes, and
needs the programmer to explicitly specify this. This book is to help you
improve at writing explicit lifetimes (things like &'a str
). And that starts
with the next chapter!
Exercise: Complete the Rustlings Exercises on Lifetimes
If you aren't sure you understand the above, before reading on, complete the rustlings exercises about lifetimes.
What are Lifetime Annotations?
In the last section, we discussed the concept of lifetimes within a single function. In all those examples, it was clear what region of code of a variable or reference existed in, based on the curly brackets. Lifetime Annotations are used to help the compiler understand what's going on when it can't rely on scope brackets (i.e. across function boundaries; and within structs and enums).
A good place to understand lifetime annotations is to start by understanding why we actually need them. Let's work through some examples to see why they exist:
The simplest possible example of a function that needs you to be explicit about lifetimes is this one, which returns a reference to the larger of two integers.
fn max_of_refs(a: &i32, b: &i32) -> &i32 {
if *a > *b {
a
} else {
b
}
}
Imagine we call this function as follows:
fn complex_function(a: &i32) -> &i32 {
let b = 2;
max_of_refs(a, &b)
}
fn main() {
let a = 1;
let my_num = complex_function(&a);
println!("{my_num}");
}
If you work through this example, you will find that my_num would be a reference to a variable
from complex_function
(which no longer exists). In other words, the lifetime of the return
value of complex_function
will be longer than the lifetime of b
.
Now, you might say, "but can't the compiler see at runtime that clearly this program won't work"? Well, because we're using constants, yes the compiler probably could tell that this program won't work.
But what if we said let a = rand::rand()
or let b = read_number_from_stdin()
?
It's impossible for a compiler to tell whether this reference should be valid.
Okay, why can't we just ban that case?
Your next thought might be "OK, surely all references of this type are unsound; lets just disallow them". It would be worth being specific about what this ban is. The simplest ban would be "no references in function parameters", but that might just be a little excessive (and entirely destructive to how useful Rust is).
A more sensible ban which would cover this case would be: "Any function with more than one reference input may not return a reference (or something containing a reference)". This avoids the problem we've seen of being unclear on where a reference is coming from. It would ban the above example.
But, how ergonomic would that be? What if you wanted a function like:
fn only_if_greater(number: &i32, greater_than: &i32) -> Option<&i32> {
if number > greater_than {
Some(number)
} else {
None
}
}
No matter the way in which you call this function, we always know that if our
return value is Some
, it refers to number
. It will never return a reference
to greater_than
.
A more interesting example of this is a split
function, which takes a string,
and returns a vector of slices of that string, split by some other string.
fn split(text: &str, delimiter: &str) -> Vec<&str> {
let mut last_split = 0;
let mut matches: Vec<&str> = vec![];
for i in 0..text.len() {
if i < last_split {
continue
}
if text[i..].starts_with(delimiter) {
matches.push(&text[last_split..i]);
last_split = i + delimiter.len();
}
}
if last_split < text.len() {
matches.push(&text[last_split..]);
}
matches
}
No matter how you call this function, it will always return a vector of slices from text
,
never from delimiter
.
Ugh, but can't the compiler just figure this out?
At this point, you can probably notice that matches.push
is only ever called with text
slices.
So you might reasonably expect that the compiler could infer lifetimes automatically in this case.
It's possible that in simple cases it could. But your compiler might decide that it can't infer lifetimes. Or it could succeed in inferring them... after 6 months.
So, the compiler needs more information. That information is provided by lifetime annotations. Before we discuss them in detail, here is an exercise that will hopefully re-inforce the concepts, before we deal with syntax.
Exercise: Identify which programs work, and which break
Without using any lifetime syntax, answer the following questions for each of the code examples:
- Which inputs are references? Which could the function return?
- Which examples could have dangling references?
NOTE: the code examples do not compile; you will need to read them and think about them.
Once you've decided your answers, the "eyeball" button in the top-right hand corner of the code block will reveal the answers.
// a is the only input reference.
// the only thing the function can return is a
fn identity(a: &i32) -> &i32 {
a
}
// This does not have any dangling references.
fn example_1() {
let x = 4;
let x_ref = identity(&x);
assert_eq!(*x_ref, 4);
}
// This is always going to cause a dangling reference.
fn example_2() {
let mut x_ref: Option<&i32> = None;
{
let x = 7;
x_ref = Some(identity(&x));
}
assert_eq!(*x_ref.unwrap(), 7);
}
// the contents of `opt` and `otherwise` are both references
// either of them could be returned.
fn option_or(opt: Option<&i32>, otherwise: &i32) -> &i32 {
opt.unwrap_or(otherwise)
}
// No possibility for a dangling reference here.
fn example_1() {
let x = 8;
let y = 10;
let my_number = Some(&x);
assert_eq!(&x, option_or(my_number, &y));
}
// This is always a dangling reference.
fn example_2() {
let answer = {
let y = 4;
option_or(None, &y)
};
assert_eq!(answer, &4);
}
// This is never a dangling reference.
fn example_3() {
let y = 4;
let answer = {
option_or(None, &y)
};
assert_eq!(answer, &4);
}
// This is always a dangling reference.
fn example_4() {
let y = 4;
let answer = {
let x = 7;
option_or(Some(&x), &y)
};
assert_eq!(answer, &7);
}
Lifetimes: The Solution to Dangling References
What we can see here is that dangling references are caused by functions losing information about where references come from.
The compiler can only decide if the function is correct if it knows how the lifetimes of its inputs and outputs interact. So, we need to tell the compiler when lifetimes of inputs and outputs will be the same.
What we can do is tell the computer "my function works for any lifetimes, as long as the lifetimes of these inputs/outputs are the same". Let's take a look at the syntax of this:
fn some_if_greater<'lifetime1, 'lifetime2>(number: &'lifetime1 i32, greater_than: &'lifetime2 i32) -> Option<&'lifetime1 i32> { if number > greater_than { Some(number) } else { None } } fn main() { let (n, gt) = (7, 4); let test = some_if_greater(&n, >); }
Let's walk through what this does:
fn my_function<'lifetime1, 'lifetime2>(...)
: what we're doing here is choosing some names for the lifetimes our program requires.number: &'lifetime1 i32
: this is us telling the compiler that this reference must live for some region of code called'lifetime1
.greater_than: &'lifetime2 i32
: this is us telling the compiler that this reference must live for some region of code called'lifetime2
. This means that the lifetimes ofgreater_than
andnumber
don't have to relate at all.-> Option<&'lifetime1 i32>
: this is where lifetimes are important. what we're saying is thatnumber
and our return value must be valid for exactly the same region of code.
So, what we've done is told the compiler that our function can only be called
if number
and the return are valid in the same region of code.
Exercise: Annotate lifetimes
Just to get some initial practice, the exercise in this section is to annotate lifetimes on some of the examples of the last two chapters.
You will need to:
- decide how many lifetime parameters are necessary
- name each of those lifetime parameters, and put them inside
<
angle brackets>
after the function's name. - annotate every reference with the appropriate lifetime
- check the code compiles
- think about what region of code each lifetime could be
You will notice each function has the #[lifetimes_required(!)]
annotation. You will need
to leave it there to complete this exercise. This instructs the compiler to throw an
error whenever you miss a lifetime; even if the compiler doesn't need the lifetime.
Why Lifetime Annotations Are Mostly Never Used
In the last chapter, we saw why we needed lifetimes. We saw that the compiler was unable to automatically tell how references in the arguments or return values might relate to each other. This is why we needed to tell the compiler that the references related to each other.
This said, you've probably written a function in Rust that needed a reference
(likely a &str
), without writing lifetimes. Why didn't you have to annotate
lifetimes then? There are some common patterns in Rust that make it obvious to
the compiler what the lifetimes should be. Let's explore some of them:
Example 1: No Output References
fn add(a: &i32, b: &i32) -> i32 { *a + *b } fn main() { assert_eq!(add(&3, &4), 7); }
The lifetimes of a
and b
in this function don't need to relate to each other. Assuming there's only one thread,
and assuming safe code, there's no way that the variable they are referencing could possibly be dropped during the
function, and after the function, they can live for as long as they like.
Example 2: Only one reference in the input
fn identity(a: &i32) -> &i32 { a } fn main() { let x = 52; assert_eq!(&x, identity(&x)); }
It's important to note that it (generally 1 ) isn't possible to create a reference and pass it out of a function if it wasn't given to you. This is because a reference must refer to something you own. Anything you own is dropped at the end of your function. Therefore, anything you own can't be referenced; and the only way you can return a reference is if you were passed a reference.
For this reason, if you only have one reference in your parameters, the only reference you could return is that one -- so the lifetime of your parameter has to be the same as the reference you return.
What to do
Rust could have specifically encoded these examples as exceptions; but then there may have been many cases which were excepted and they would have ended up with confusing rules.
Instead, the Rust project has settled on a procedure the compiler will follow to try and guess lifetimes.
The compiler first splits all the references in a function signature into two types: 'input' and 'output'. 'Input' references are those in the parameters of the function (i.e. it's arguments). 'Output' references are those in the return type of the function.
The two rules that we'll learn in this chapter are:
- Each place that an input lifetime is left out (a.k.a 'elided') is filled in with its own lifetime.
- If there's exactly one lifetime on all the input references, that lifetime is assigned to every output lifetime.
Let's see how those rules affect the above two examples, and an example from the last chapter:
Example 1: No Output References
We had:
fn add(a: &mut i32, b: &mut i32) -> i32 {
*a + *b
}
There are two input lifetimes, the ones needed in the types of a
and b
. Each get allocated their own lifetime:
fn add<'elided1, 'elided2>(a: &'elided1 i32, b: &'elided2 i32) -> i32 { *a + *b } fn main() { assert_eq!(add(&3, &4), 7); }
There are no output lifetimes, so we end there.
This example is now correct -- the two lifetimes of a
and b
can be entirely unrelated;
and the output is an owned value, so it doesn't depend on any lifetimes at all.
Example 2: Only one reference in the input
We had:
fn identity(a: &i32) -> &i32 { a } fn main() { let x = 52; assert_eq!(&x, identity(&x)); }
There is only one input lifetime (needed for the type of a
):
fn identity<'elided1>(a: &'elided1 i32) -> &i32 { a } fn main() { let x = 52; assert_eq!(&x, identity(&x)); }
There is only one output lifetime; and all the input lifetimes share the same lifetime ('elided1
);
therefore we can allocate all output lifetimes that lifetime:
fn identity<'elided1>(a: &'elided1 i32) -> &'elided1 i32 { a } fn main() { let x = 52; assert_eq!(&x, identity(&x)); }
This now makes sense: the only possible way you could return a &i32
is if you got it from a parameter;
and we can see that the input and output share a lifetime.
Example 3: The Limits of Elision
Let's have another look at this example from the last chapter.
fn max_of_refs(a: &i32, b: &i32) -> &i32 {
if *a > *b {
a
} else {
b
}
}
Just like Example 1, there are two input lifetimes, so we give them distinct lifetimes:
fn max_of_refs<'elided1, 'elided2>(a: &'elided1 i32, b: &'elided2 i32) -> &i32 {
if *a > *b {
a
} else {
b
}
}
But unlike Example 1, we need an output lifetime! By the second rule, we can only elide the output lifetime if we have exactly one input lifetime. Therefore, Rust considers it an error to elide lifetimes here -- the user has to give more information!
Exercise: Apply These Rules
In this exercise, there are four functions which are missing some lifetime annotations. Your task is to manually follow the lifetime elision rules, and give these functions lifetimes in the same way that the compiler would.
In a future release of lifetimekata, these will be checked automatically. For now, when you're done, compare your answer to the solutions.
1: this is possible with static types, like string literals, but we'll cover those later
Mutable References and Containers
Mutable References work exactly the same way as regular references, with regards to lifetime elision. The reason we have a chapter about them, however, is that if you have a mutable reference, you might need to tell the compiler about lifetimes even without a return value.
For example, let's take a look at this example:
fn insert_value(my_vec: &mut Vec<&i32>, value: &i32) {
my_vec.push(value);
}
We're not returning anything; so lifetimes don't matter, right?
Unfortunately, lifetimes are still important. The reference value
actually needs to
live for the same time as the contents of the vector. If they didn't,
the vector might contain an invalid reference. For example, what would happen
in this scenario?
fn insert_value(my_vec: &mut Vec<&i32>, value: &i32) {
my_vec.push(value);
}
fn main() {
let x = 1;
let my_vec = vec![&x];
{
let y = 2;
insert_value(&mut my_vec, &y);
}
println!("{my_vec:?}");
}
The reference to y
in the above example is dangling when we try to print the vector!
We can use lifetimes to ensure that the two references live for the same amount of time:
fn insert_value<'vec_lifetime, 'contents_lifetime>(my_vec: &'vec_lifetime mut Vec<&'contents_lifetime i32>, value: &'contents_lifetime i32) { my_vec.push(value) } fn main(){ let mut my_vec = vec![]; let val1 = 1; let val2 = 2; insert_value(&mut my_vec, &val1); insert_value(&mut my_vec, &val2); println!("{my_vec:?}"); }
This signature indicates that there are two lifetimes:
'vec_lifetime
: The vector we've passed the function will need to live for a certain period of time.'contents_lifetime
: The contents of the vector need to live for a certain period of time. Importantly, the newvalue
we're inserting needs to live for just as long as the contents of the vector. If they didn't, you would end up with a vector that contains an invalid reference.
Do We Even Need Two Lifetimes?
You might wonder what happens if we don't provide two lifetimes. Does just one lifetime work?
fn insert_value<'one_lifetime>(my_vec: &'one_lifetime mut Vec<&'one_lifetime i32>, value: &'one_lifetime i32) {
my_vec.push(value)
}
fn main(){
let mut my_vec: Vec<&i32> = vec![];
let val1 = 1;
let val2 = 2;
insert_value(&mut my_vec, &val1);
insert_value(&mut my_vec, &val2);
println!("{my_vec:?}");
}
No, it doesn't. We get two errors. Let's look at the first one:
error[E0499]: cannot borrow `my_vec` as mutable more than once at a time
--> /tmp/rust.rs:11:18
|
10 | insert_value(&mut my_vec, &val1);
| ----------- first mutable borrow occurs here
11 | insert_value(&mut my_vec, &val2);
| ^^^^^^^^^^^
| |
| second mutable borrow occurs here
| first borrow later used here
This seems strange -- why can't you borrow my_vec
?
Well, let's walk through what the compiler sees:
&val
needs to last for as long as my_vec
exists:
fn insert_value<'one_lifetime>(my_vec: &'one_lifetime mut Vec<&'one_lifetime i32>, value: &'one_lifetime i32) {
my_vec.push(value)
}
fn main(){
let mut my_vec: Vec<&i32> = vec![];
let val1 = 1;
let val2 = 2;
insert_value(&mut my_vec, &val1); // \
insert_value(&mut my_vec, &val2); // | - &val1 needs to last this long.
// |
println!("{my_vec:?}"); // /
}
Whereas &mut my_vec
only needs to last for the duration of insert_value
.
fn insert_value<'one_lifetime>(my_vec: &'one_lifetime mut Vec<&'one_lifetime i32>, value: &'one_lifetime i32) {
my_vec.push(value)
}
fn main(){
let mut my_vec: Vec<&i32> = vec![];
let val1 = 1;
let val2 = 2;
insert_value(&mut my_vec, &val1); // <- &mut my_vec only needs to last this long.
insert_value(&mut my_vec, &val2);
println!("{my_vec:?}");
}
But, we've told the compiler that it needs the borrows of both &val1
and
&mut my_vec
to share the same lifetime. So the compiler extends the borrow
of &mut my_vec
to ensure they do share a lifetime:
It sees that if it let &mut my_vec
live as long as &val1
, it would
have that single region of code:
fn insert_value<'one_lifetime>(my_vec: &'one_lifetime mut Vec<&'one_lifetime i32>, value: &'one_lifetime i32) {
my_vec.push(value)
}
fn main(){
let mut my_vec: Vec<&i32> = vec![];
let val1 = 1;
let val2 = 2;
insert_value(&mut my_vec, &val1); // \
insert_value(&mut my_vec, &val2); // | - 'one_lifetime must be this region of code.
// |
println!("{my_vec:?}"); // /
}
And that's fine. But now the compiler gets to the next line, and it sees you're
trying to borrow &mut my_vec
again.
The compiler already decided &mut my_vec
has to exist until the end of the function.
So now, you're asking it to create two mutable references... and that's not allowed.
So the compiler throws an error -- you're not allowed to borrow &mut my_vec
again.
Why does having two lifetimes fix this error?
Have a think before reading this section -- why does having two lifetimes solve this bug?
Before, the compiler had to decide that &mut my_vec
and &val1
shared a lifetime.
In other words, that they lived as long as each-other.
By using two lifetimes, we've told the compiler that &mut my_vec
and &val1
don't necessarily have to live for the same amount of time. And so,
it finds the following lifetimes:
fn insert_value<'vec_lifetime, 'contents_lifetime>(my_vec: &'vec_lifetime mut Vec<&'contents_lifetime i32>, value: &'contents_lifetime i32) {
my_vec.push(value)
}
fn main(){
let mut my_vec: Vec<&i32> = vec![];
let val1 = 1;
let val2 = 2;
insert_value(&mut my_vec, &val1); // <- 'vec_lifetime \
insert_value(&mut my_vec, &val2); // | 'contents_lifetime
// |
println!("{my_vec:?}"); // /
}
Exercise Part 1: The Other Error
First, let's look at the other error we got in the last section:
error[E0502]: cannot borrow `my_vec` as immutable because it is also borrowed as mutable
--> /tmp/rust.rs:13:16
|
10 | insert_value(&mut my_vec, &val1);
| ----------- mutable borrow occurs here
...
13 | println!("{my_vec:?}");
| ^^^^^^
| |
| immutable borrow occurs here
| mutable borrow later used here
|
Can you explain why this error occurs? Write it out in 50 words or less.
Exercise Part 2: Writing Our Own
Add appropriate lifetimes to the function in the exercise.
Lifetimes on Types
So far, we've only discussed lifetimes as applied to functions. Functions are not the only place where you will need explicit lifetimes. Types (structs and enums) can have lifetimes too.
This is because if a struct contains a reference, the user needs to clarify how long it lasts for.
Imagine we wanted to split a &str
in two, and create a
struct with a start
and end
field?
Well, we could write a function like this:
struct SplitStr {
start: &str,
end: &str
}
fn split<'text, 'delim>(text: &'text str, delimiter: &'delim str) -> Option<SplitStr> {
let (start, end) = text.split_once(delimiter)?;
Some(SplitStr {
start,
end
})
}
fn main() {}
And we're done! Right?
Well, how long do those string references live?
What if we called the function like this:
struct SplitStr {
start: &str,
end: &str
}
fn split<'text, 'delim>(text: &'text str, delimiter: &'delim str) -> Option<SplitStr> {
let (start, end) = text.split_once(delimiter)?;
Some(SplitStr {
start,
end
})
}
fn main() {
let mut parts_of_string: Option<SplitStr> = None;
{
let my_string = String::from("First line;Second line");
parts_of_string = split(&my_string, ";");
}
println!("{parts_of_string:?}");
}
Well, the references inside the SplitStr
struct are now dangling,
since they both pointed to my_string
; but that only existed inside the curly brackets.
So, Rust forces us to specify the lifetime of all references inside a struct. Here's how we'd fix our code:
struct SplitStr<'str_lifetime> { start: &'str_lifetime str, end: &'str_lifetime str } fn split<'text, 'delim>(text: &'text str, delimiter: &'delim str) -> Option<SplitStr<'text>> { let (start, end) = text.split_once(delimiter)?; Some(SplitStr { start, end }) } fn main() {}
Now, when we return an Option<SplitStr<'text>>
the compiler knows that references inside the struct
must all last for the same lifetime as 'text
. If we try to return a SplitStr
where the references
can't last for 'text
, that will be a compiler error.
A Note on Enums
References work exactly the same way in enums as they do in structs. We don't go into detail on them here because they are interchangeable.
enum StringOption<'a> { Some(&'a str), None } fn main() {}
Two Lifetimes
Occasionally, structs will have more than one lifetime on them. This happens when the data inside them comes from two different places, with two lifetimes.
Take the example of a program to find unique words among two sentences.
You might have the first sentence be "I love to swim and surf."
, and the
second be "I love to ski and snowboard."
. The words unique to the first
sentence are "swim"
and "surf"
. The words unique to the second sentence are
"ski"
and "snowboard"
.
If you said that the two sentences had to share a lifetime, you would be forcing the user to ensure that the two sentences came from the same place, and therefore had the same lifetime. But what if one came from a file that was open for the whole running of the program, but the second was scanned in inside a loop?
In that case, the compiler would insist that the scanned in value was saved for the whole of the program, which would not be ergonomic.
Exercise: Two Lifetimes on a Struct
In this exercise, we will be modifying a small program which finds the unique words between two strings. At the moment, it does not have any lifetime annotations, and therefore does not compile.
Our goal is to return a struct that contains all the unique words from the first string, and all the unique words from the second string. They should have separate lifetimes.
Lifetimes on Impls
When structs or enums have lifetimes on them, the way that impl
blocks
work also changes slightly.
For example, say we want to create a struct which lets the user iterate over a sentence. You might start off something like this:
// First, the struct:
/// This struct keeps track of where we're up to in the string.
struct WordIterator<'s> {
position: usize,
string: &'s str
}
impl WordIterator {
/// Creates a new WordIterator based on a string.
fn new(string: &str) -> WordIterator {
WordIterator {
position: 0,
string
}
}
/// Gives the next word. `None` if there aren't any words left.
fn next_word(&mut self) -> Option<&str> {
let start_of_word = &self.string[self.position..];
let index_of_next_space = start_of_word.find(' ').unwrap_or(start_of_word.len());
if start_of_word.len() != 0 {
self.position += index_of_next_space + 1;
Some(&start_of_word[..index_of_next_space])
} else {
None
}
}
}
fn main() {
let text = String::from("Twas brillig, and the slithy toves // Did gyre and gimble in the wabe: // All mimsy were the borogoves, // And the mome raths outgrabe. ");
let mut word_iterator = WordIterator::new(&text);
assert_eq!(word_iterator.next_word(), Some("Twas"));
assert_eq!(word_iterator.next_word(), Some("brillig,"));
}
When defining our WordIterator
struct, we said it requires a lifetime to be specified.
But when we then wrote the impl block, we didn't specify one. Rust requires that we do this.
The way we do this is by telling Rust about a lifetime, and then putting that lifetime onto our struct. Let's see how we do that:
impl<'lifetime> for WordIterator<'lifetime> {
// ...
}
It's useful to note that we've done this in two parts -- impl<'lifetime>
defines a lifetime 'lifetime
.
It doesn't make any promises about what that lifetime is, it just says it exists.
WordIterator<'lifetime>
then uses the lifetime we created, and says "the references in WordIterator
must live for lifetime
".
Now, anywhere in the impl block, we can choose to use that lifetime. Any reference we annotate with 'lifetime'
must have the same lifetime as any other reference annotated with 'lifetime'
.
/// This struct keeps track of where we're up to in the string.
struct WordIterator<'s> {
position: usize,
string: &'s str
}
impl<'lifetime> WordIterator<'lifetime> {
/// Creates a new WordIterator based on a string.
fn new(string: &'lifetime str) -> WordIterator<'lifetime> {
WordIterator {
position: 0,
string
}
}
/// Gives the next word. `None` if there aren't any words left.
fn next_word(&mut self) -> Option<&str> {
let start_of_word = &self.string[self.position..];
let index_of_next_space = start_of_word.find(' ').unwrap_or(start_of_word.len());
if start_of_word.len() != 0 {
self.position += index_of_next_space + 1;
Some(&start_of_word[..index_of_next_space])
} else {
None
}
}
}
fn main() {
let text = String::from("Twas brillig, and the slithy toves // Did gyre and gimble in the wabe: // All mimsy were the borogoves, // And the mome raths outgrabe. ");
let mut word_iterator = WordIterator::new(&text);
assert_eq!(word_iterator.next_word(), Some("Twas"));
assert_eq!(word_iterator.next_word(), Some("brillig,"));
}
Lifetime Elision, Redux
We previously discussed two rules for lifetime elision. They are:
- Each place that an input lifetime is left out (a.k.a 'elided') is filled in with its own lifetime.
- If there's exactly one lifetime on all the input references, that lifetime is assigned to every output lifetime.
Now that we've seen impl
blocks that have lifetimes, let's discuss one more:
- If there are multiple input lifetime positions, but one of them is
&self
or&mut self
, the lifetime of the borrow ofself
is assigned to all elided output lifetimes.
This means that even if you take in many references in your arguments, Rust will assume that any references you return
come from self
, not any of those other references.
Exercise
In the following code, we annotate the function using the 'borrow
lifetime, not only the 'lifetime
lifetime.
The 'borrow
lifetime only exists inside this function, and only affects the borrows of its arguments and return
value. The 'lifetime
value, as we saw before, also constrains the lifetime of the string inside the struct.
There are four ways we could implement this code. Describe the effect of each of these implementations.
Specifically:
- Do they compile?
- Are any of them identical to another one?
- Are there any circumstances where their lifetimes are not general enough?
- Which would be the "most" correct to write?
Example 1
/// Gives the next word. `None` if there aren't any words left.
/// This compiles. It's the exact same as Example 4.
/// This function is problematic because the next word lives as long
/// as your borrow of the iterator. In order to get the next word, you
/// must drop all references to the current one.
fn next_word<'borrow>(&'borrow mut self) -> Option<&'borrow str> {
// ...
}
Example 2
/// Gives the next word. `None` if there aren't any words left.
/// This compiles. It's the exact same as Example 3.
fn next_word<'borrow>(&'borrow mut self) -> Option<&'lifetime str> {
// ...
}
Example 3
/// Gives the next word. `None` if there aren't any words left.
/// This compiles. It's probably the "most" correct, because it's the shortest
/// to write, but also ensures you can retain the returned strings, even if
/// you call this function multiple times.
fn next_word(&mut self) -> Option<&'lifetime str> {
// ...
}
Example 4
/// Gives the next word. `None` if there aren't any words left.
/// This compiles. If expanded, it would be the same as Example 1.
fn next_word(&mut self) -> Option<&str> {
// ...
}
Special Lifetimes and Bounds
There are two special lifetimes in Rust. It's worth discussing them both:
'static
'_
(the implicit lifetime)
The static
lifetime
Some things in your program are guaranteed to last forever. The most common reason for this is when they're information bundled inside your binary. For example, when you write a program like this:
fn main() { let my_text = "Hello World"; }
The text "Hello World"
is actually somewhere inside the compiled binary. This means that
the reference to it is always valid, since the text is always there as long as the program
is running.
Therefore, if we were to talk about the type of the text, we'd say it's a &'static str
.
Similarly, any references to a constant can also be &'static
. For example:
const SOME_COORDINATE: (i32, i32) = (7, 4); fn main() { let static_reference: &'static (i32, i32) = &SOME_COORDINATE; }
The '_
lifetime (Anonymous Lifetimes, Placeholder Lifetimes)
The implicit lifetime tells Rust to figure out the lifetime itself. There are three places where this lifetime is useful:
- To simplify
impl
blocks - When consuming/returning a type that needs a lifetime
- To write trait objects that contain references.
Simplifying Impl Blocks
Say you're implementing a counter struct, that looks like this:
struct Counter<'a> { counter: &'a mut i32 } impl<'a> Counter<'a> { fn increment(&mut self) { *self.counter += 1; } } fn main() { let mut num = 0; let mut counter = Counter { counter: &mut num }; counter.increment(); println!("{num}"); // prints 1 }
That's fine, but you'll notice that the impl
block doesn't actually use the 'a
lifetime anywhere.
Therefore, we can simplify things by writing the following instead:
impl Counter<'_> {
fn increment(&mut self) {
self.counter += 1;
}
}
The two impl
blocks above mean the same thing, but just take slightly fewer arguments.
Returning Structs and Enums
This is recommended for the situation where you are returning a struct/enum that contains a reference. You could write something like this:
struct StrWrap<'a>(&'a str); fn make_wrapper(string: &str) -> StrWrap { StrWrap(string) } fn main() {}
But that syntax is no longer recommended, as you will see when you add the
#![deny(rust_2018_idioms)]
annotation, where you get the error:
error: hidden lifetime parameters in types are deprecated
--> src/main.rs:8:34
|
_ | fn make_wrapper(string: &str) -> StrWrap {
| ^^^^^^^ expected lifetime parameter
|
note: the lint level is defined here
--> src/main.rs:1:9
|
_ | #![deny(rust_2018_idioms)]
| ^^^^^^^^^^^^^^^^
= note: `#[deny(elided_lifetimes_in_paths)]` implied by `#[deny(rust_2018_idioms)]`
help: indicate the anonymous lifetime
|
_ | fn make_wrapper(string: &str) -> StrWrap<'_> {
| ++++
By following the hint, it becomes clearer that StrWrap
does contain a reference,
but that the compiler should just figure it out.
Lifetimes on Trait Objects
See Chapter 10: Footnote on Trait Lifetime Bounds for the gory details.
Lifetime Bounds
Lifetime bounds are not widely used, so we don't devote a large section of these exercises to them. You can probably skip this section unless you really want to know the details.
In short, they allow you to specify that one lifetime should outlive another. To specify one, use a where clause, such as
where 'a: 'b
.
To quote the Rust Reference:
Lifetime bounds can be applied to types or to other lifetimes. The bound
'a: 'b
is usually read as'a
outlives'b
.'a: 'b
means that'a
lasts at least as long as'b
, so a reference&'a ()
is valid whenever&'b ()
is valid.
fn f<'a, 'b>(x: &'a i32, mut y: &'b i32) where 'a: 'b { y = x; // &'a i32 is a subtype of &'b i32 because 'a: 'b let r: &'b &'a i32 = &&0; // &'b &'a i32 is well formed because 'a: 'b }
T: 'a
means that all lifetime parameters ofT
outlive'a
. For example, if'a
is an unconstrained lifetime parameter, theni32: 'static
and&'static str: 'a
are satisfied, butVec<&'a ()>: 'static
is not.
Exercise
You have been given code which contains many uses of the lifetimes 'a
and 'b'
.
All of these lifetimes can be replaced with either '_
or 'static
.
Your task is to replace every occurance of the lifetimes 'a
and 'b
with either
'_
or 'static
, to remove excessive lifetime declarations, and to ensure your
code still compiles.
Footnote on Out of Date Information
The Rust Edition Guide previously contained a section about anonymous lifetimes. The most popular google result is now this article but I recommend disregarding it, as it is out of date information.
Finale
Congratulations on getting to the end of LifetimeKata. There's more footnotes and extra reading in the next two chapters, but this chapter is a "finale" of sorts.
In this exercise, we will be building a very simple clone of a glob system. This allows someone to ask whether a piece of text matches some description.
It's worth noting that implementing the whole thing may take up to an hour. If you just want to work on the lifetimes,
you can copy code from the solution
, but this is a fun
and rewarding exercise to complete in its entirety.
For example, the glob ab(cd|ef|gh)
matches any of the following strings: abcd
, abef
, abgh
You will create a Matcher
struct, which has three fields:
- A
&str
, representing the textual representation of our regex. - A
Vec<MatcherTokens>
, representing the different parts of the regex, in order. - An integer, keeping track of what the longest match for our regex was.
To create this, you'll take a string that looks like this: hello.(town|world|universe).its.me
.
There are three components to this:
- regular text, like 'hello', 'its' or 'me', which should only match that exact text
- wildcards (the
.
character), which matches any single character. - optional text, like
(town|world|universe)
, which matches exactly one of a list of strings. So(town|world|universe)
matchestown
, ORworld
, ORuniverse
.
These can be mixed and matched in any order (but you will never have one inside the other). With this string, you should create a vector of MatcherTokens which refer to the relevant parts of that string.
You will then write a function which takes another string, and sees how much of the Matcher
that particular string matches. You will return a vector of (MatcherToken, &str)
, where the
MatcherToken
is the token that matched some text, and the &str
is the text that was matched.
An Example
Say you had the matcher (Black|Bridge)(rock|stone|water).company
. This can be broken down into four parts:
OneOfText(["Black", "Bridge"])
OneOfText(["rock", "stone", "water"])
Wildcard
RawText("company")
Now, let's imagine we're given the following text: BlackBridge
. Black
matches the first token,
but Bridge
does not match the second token.
So, we would return: vec![(OneOfText(["Black", "Bridge"]), "Black")"]
. The most tokens we've matched is 1.
For a different example, take Bridgestone_Tyres
.
Bridge
matches the first matcher, stone
matches
the second matcher, _
matches the third matcher,
but Tyres
doesn't match company
. So the most tokens
we've matched is 3. We'd return a vec containing:
- (
OneOfText(["Black", "Bridge"])
,Bridge
) - (
OneOfText(["rock", "stone", "water"])
,"stone"
) - (
Wildcard
,"_"
)
A Note On Unicode
Rust is able to deal with unicode characters (like emoji or Japanese Kanji) in its strings. Of course, this increases the amount of complexity that is required for simple operations like splitting a string into pieces, because it's possible to accidentally split a character in half.
The tests in the example do not use unicode, however if you want a "true" Rust experience, change the tests to include a unicode character (an example is in the comments).
Further Reading in the Rust Reference
The best sources of information about lifetimes are the Rust Reference and Rustonomicon. If you require a really advanced knowledge of lifetimes to complete a project, the reference will contain that knowledge. Most times, however, if you feel you need to understand something complex about lifetimes, you may find that there is a simpler alternative.
Other Useful Lifetimes Content
Variance and Subtyping
This guide does not cover the topic of "variance" at all, which is how lifetimes can be substituted for one-another. Variance, while theoretically important, is not useful in a day-to-day understanding of lifetimes, and so it was not included in the book.
You can read more about it in the Rustonomicon (subtyping).
Brain Teaser 1: Why doesn't this program work:
If you're interested in working through a really difficult exercise to test your understanding of lifetimes and generics, the following exercise may be interesting.
This should be another way of implementing the code in exercise 5. Unfortunately, it doesn't work. This took the author of this book 20 minutes to figure out (after having written five chapters), so he challenges you to do better!
use std::collections::HashSet;
struct Difference<'first, 'second> {
first_only: Vec<&'first str>,
second_only: Vec<&'second str>
}
fn find_difference<'fst, 'snd>(sentence1: &'fst str, sentence2: &'snd str) -> Difference<'fst, 'snd> {
let sentence_1_words: HashSet<&str> = sentence1.split(" ").collect();
let sentence_2_words: HashSet<&str> = sentence2.split(" ").collect();
Difference {
first_only: (&sentence_1_words - &sentence_2_words).into_iter().collect(),
second_only: (&sentence_2_words - &sentence_1_words).into_iter().collect(),
}
}
fn main() {
let first_sentence = String::from("I love the surf and the sand.");
let second_sentence = String::from("I hate the surf and the sand.");
let first_only = {
let third_sentence = String::from("I hate the snow and the sand.");
let diff = find_difference(&first_sentence, &third_sentence);
diff.first_only
};
assert_eq!(first_only, vec!["hate", "surf"]);
let second_only = {
let third_sentence = String::from("I hate the snow and the sand.");
let diff = find_difference(&third_sentence, &second_sentence);
diff.second_only
};
assert_eq!(second_only, vec!["snow"]);
}
For more information about this issue, read this Rust issue.
Footnote: Lifetimes on Trait Objects
In chapter 7, we discussed placeholder lifetimes ('_
). We said that
there were three ways you could use them:
- To simplify
impl
blocks - When consuming/returning a type that needs a lifetime
- To write trait objects that contain references.
In the first case, we saw that anonymous lifetimes just simplified what we needed to write.
In the second case, we saw that Rust recommends that we use it, but we don't need to -- lifetime elision will do what we want.
The one case where it looks like lifetime elision should do what we want,
but it actually doesn't unless we use the '_
is the case of trait objects.
This chapter walks through how trait objects and lifetimes work together.
Let's setup a simple example:
trait Bool { fn truthiness(&self) -> bool; } struct True(); impl Bool for True { fn truthiness(&self) -> bool { true } } struct False(); impl Bool for False { fn truthiness(&self) -> bool { false } } fn get_bool(b: bool) -> Box<dyn Bool> { if b == true { Box::new(True()) } else { Box::new(False()) } } fn main() { let my_bool = true; let my_dyn = get_bool(my_bool); println!("{}", my_dyn.truthiness()); }
To be clear, what we are doing here is creating two structs which represent
true
and false
. They both implement the Bool
trait, which has
the truthiness
function which returns true
or false
.
The get_bool
function returns a Boxed Bool
trait object, based on whether
get_bool
is passed true
or false
.
It's important to realise that since trait objects might or might not contain a reference (or any number of references), all trait objects have lifetimes. This is true, even if no implementors of the trait contain references.1
So, since we need to associate a lifetime with our trait object, we might
think we could rely on lifetime elision. But how would lifetime elision
work for our get_bool
function? There are no input references, so what
output lifetime should we give the trait object? Lifetime elision can't help
us here.
So, in RFC 599 and in RFC 1156, the rules for trait object lifetimes were changed.
The rules are complex, and best outlined in the reference,
but in the case of get_bool
, it means that the lifetime inferred for dyn Bool
is
'static
.
Let's change the example slightly now, such that the struct contains a reference to a bool:
trait Bool {
fn truthiness(&self) -> bool;
}
// CHANGE 1: added &'a bool here
struct True<'a>(&'a bool);
impl<'a> Bool for True<'a> {
fn truthiness(&self) -> bool {
true
}
}
// CHANGE 2: added &'a bool here
struct False<'a>(&'a bool);
impl<'a> Bool for False<'a> {
fn truthiness(&self) -> bool {
false
}
}
fn get_bool(b: &bool) -> Box<dyn Bool> {
if *b == true {
Box::new(True(b))
} else {
Box::new(False(b))
}
}
// CHANGE 3: Update the
fn main() {
let my_dyn = {
let my_bool = true;
get_bool(&my_bool)
// my_bool is dropped here, so the trait object we're returning
// has a dangling reference.
};
println!("{}", my_dyn.truthiness());
}
Now, we get an error:
error: lifetime may not live long enough
--> src/main.rs:22:5
|
21 | fn get_bool(b: &bool) -> Box<dyn Bool> {
| - let's call the lifetime of this reference `'1`
22 | / if *b == true {
23 | | Box::new(True(b))
24 | | } else {
25 | | Box::new(False(b))
26 | | }
| |_____^ returning this value requires that `'1` must outlive `'static`
|
help: to declare that the trait object captures data from argument `b`, you can add an explicit `'_` lifetime bound
|
21 | fn get_bool(b: &bool) -> Box<dyn Bool + '_> {
| ++++
error: could not compile __ due to previous error
Even though lifetime elision means that get_bool
should end up with a
signature like fn get_bool<'elided>(b: &'elided bool) -> Box<dyn Bool + 'elided>
, it doesn't. The special rules for trait objects mean that the
lifetime is: fn get_bool<'elided>(b: &'elided bool) -> Box<dyn Bool + 'static>
. That 'static
bound is incorrect.
Therefore, we need the '_
bound (as this error message tells us) to inform Rust that it
should use the normal lifetime elision rules; rather than the special trait
object rules.
1: https://doc.rust-lang.org/reference/types/trait-object.html#trait-object-lifetime-bounds