LifetimeKata

Welcome to LifetimeKata, a set of exercises which you can use to improve your understanding of lifetimes in Rust. While many tasks involve writing compiling code, some will also involve creating specific errors.

You should complete the kata in order, as they increase in difficulty, and depend on previous kata.

Getting Started

Clone this repository:

$ git clone https://www.github.com/tfpk/lifetimekata/

Most exercises are run in two steps:

$ cargo build --package ex04

And then either:

$ cargo test --package ex04

or:

$ cargo run --package ex04

depending on whether it's a binary or a library.

Chapter 0: References and Lifetimes Review

(This section is review, and may be familiar to some readers. If you already know what a reference is, you can skip it.)

If you've been using Rust for a while, you've likely already encountered references. For every type T, there are two types of references to it:

  • &T: A shared reference (often called a shared borrow) of T. You can have as many of these as you'd like, but they do not allow you to modify the data they refer to.
  • &mut T: A mutable reference (often called an exclusive borrow) of T. You can only have one of these at a time, but they allow you to modify the data they refer to.

References make it easy to call a function on data without needing to copy that data.

The powerful thing about Rust's references is that they are guaranteed to always refer to something that still exists (i.e. has not been dropped/freed/gone out of scope). A reference to something that no longer exists is called a "dangling reference", and Rust guarantees you will never have one. Therefore, this example will not compile:

fn main() {
    let x_ref = {
        let x = 3;
        &x
    };
    // x_ref would now refer to `x`, but `x` is out of scope, so x_ref is dangling.
   
    println!("{}", x_ref)
}

Most modern languages (Python, Java, etc.) avoid the problem of dangling references by constantly checking at runtime whether you have any references to something, and dropping only when you have no references left. This is called "garbage collection", and the advantage of it is that you never need to think about when objects dropped. The language just does it for you. The disadvantage is performance -- garbage collection requires stopping your program occasionally for the language to scan every reference you have.

Some languages (notably C and Assembly) give you access to a "pointer" type. Since pointers are raw addresses in memory, the compiler leaves it to the programmer to ensure that they do not have dangling references. This allows them to be used in memory constrained or performance critical environments, but unfortunately means that a bug can access memory after it's destroyed; resulting in a crash, or worse, a security issue.

Rust is powerful because it gives you the convenience of knowing at run-time that you will never access freed memory; but the price you pay for this is the compiler needing to be convinced at compile-time that you've correctly used references.

An Example of an Unconvinced Compiler

You've undoubtedly come across errors like the one below before:

fn main() {
    let mut my_reference: Option<&String> = None;

    // Starting a scope.
    {
        // my_variable created                               // \ \
        let my_variable: String = "hello".to_string();       // | |
        my_reference = Some(&my_variable);                   // | |- my_variable exists here. ('variable)
        // At the end of the scope, `my_variable` is dropped // | |
        drop(my_variable);                                   // | |
        // my variable destroyed                             // | /
    }                                                        // | - my_reference needs to exist here. ('reference)
                                                             // |
    if let Some(reference) = my_reference {                  // |
        println!("{}", reference);                           // |
    }                                                        // /

}
error[E0597]: `my_variable` does not live long enough
  --> bad_lifetimes.rs:7:29
   |
7  |         my_reference = Some(&my_variable);
   |                             ^^^^^^^^^^^^ borrowed value does not live long enough
8  |     }
   |     - `my_variable` dropped here while still borrowed
9  |
10 |     if let Some(reference) = my_reference {
   |                              ------------ borrow later used here

error: aborting due to previous error; 1 warning emitted

Clearly in this example, since my_variable goes out of scope before my_reference, it's possible that the if let could try and access my_reference, and find that it's referencing a variable that no longer exists.

Rust says that this variable "does not live long enough". It notices that "it's possible that my_variable is dropped before the reference stored in my_reference".

Formally, we can see this by noticing the regions of code where these two things need to exist. The region of code where the reference needs to exist is larger than the region of code where the variable exists. This indicates that there must be part of the time the reference exists where the variable has been dropped, and therefore a dangling reference might exist.

We call a region of code where a reference must be valid a "lifetime". We can give lifetimes names using the syntax 'name. So let us say that 'variable is the region of code where a reference to the variable is valid. Also, let's say that 'reference is the region of code where the reference could be used. We can formally say that 'variable must be larger than 'reference.

This is obviously true, it is a shorthand for saying "the region of code where the reference is valid must be larger than the region of code where the reference is actually usable". Consider the opposite: if the reference was usable somewhere where the reference wasn't valid, you'd have something that was invalid: unsound code, or in other words, a bug.

So what's this book about then?

There are places where the Rust compiler is unable to figure out lifetimes, and needs the programmer to explicitly specify this. This book is to help you improve at writing explicit lifetimes (things like &'a str). And that starts with the next chapter!

Exercise: Complete the Rustlings Exercises on Lifetimes

If you aren't sure you understand the above, before reading on, complete the rustlings exercises about lifetimes.

What are Lifetime Annotations?

In the last section, we discussed the concept of lifetimes within a single function. In all those examples, it was clear what region of code of a variable or reference existed in, based on the curly brackets. Lifetime Annotations are used to help the compiler understand what's going on when it can't rely on scope brackets (i.e. across function boundaries; and within structs and enums).

A good place to understand lifetime annotations is to start by understanding why we actually need them. Let's work through some examples to see why they exist:

The simplest possible example of a function that needs you to be explicit about lifetimes is this one, which returns a reference to the larger of two integers.

fn max_of_refs(a: &i32, b: &i32) -> &i32 {
    if *a > *b {
        a
    } else {
        b
    }
}

Imagine we call this function as follows:

fn complex_function(a: &i32) -> &i32 {
    let b = 2;
    max_of_refs(a, &b)
}

fn main() {
    let a = 1;
    let my_num = complex_function(&a);
    println!("{my_num}");
}

If you work through this example, you will find that my_num would be a reference to a variable from complex_function (which no longer exists). In other words, the lifetime of the return value of complex_function will be longer than the lifetime of b.

Now, you might say, "but can't the compiler see at runtime that clearly this program won't work"? Well, because we're using constants, yes the compiler probably could tell that this program won't work.

But what if we said let a = rand::rand() or let b = read_number_from_stdin()? It's impossible for a compiler to tell whether this reference should be valid.

Okay, why can't we just ban that case?

Your next thought might be "OK, surely all references of this type are unsound; lets just disallow them". It would be worth being specific about what this ban is. The simplest ban would be "no references in function parameters", but that might just be a little excessive (and entirely destructive to how useful Rust is).

A more sensible ban which would cover this case would be: "Any function with more than one reference input may not return a reference (or something containing a reference)". This avoids the problem we've seen of being unclear on where a reference is coming from. It would ban the above example.

But, how ergonomic would that be? What if you wanted a function like:

fn only_if_greater(number: &i32, greater_than: &i32) -> Option<&i32> {
    if number > greater_than {
        Some(number)
    } else {
        None
    }
}

No matter the way in which you call this function, we always know that if our return value is Some, it refers to number. It will never return a reference to greater_than.

A more interesting example of this is a split function, which takes a string, and returns a vector of slices of that string, split by some other string.

fn split(text: &str, delimiter: &str) -> Vec<&str> {
    let mut last_split = 0;
    let mut matches: Vec<&str> = vec![];
    for i in 0..text.len() {
        if i < last_split {
            continue
        }
        if text[i..].starts_with(delimiter) {
            matches.push(&text[last_split..i]);
            last_split = i + delimiter.len(); 
        }
    }
    if last_split < text.len() {
        matches.push(&text[last_split..]);
    }
    
    matches
}

No matter how you call this function, it will always return a vector of slices from text, never from delimiter.

Ugh, but can't the compiler just figure this out?

At this point, you can probably notice that matches.push is only ever called with text slices. So you might reasonably expect that the compiler could infer lifetimes automatically in this case.

It's possible that in simple cases it could. But your compiler might decide that it can't infer lifetimes. Or it could succeed in inferring them... after 6 months.

So, the compiler needs more information. That information is provided by lifetime annotations. Before we discuss them in detail, here is an exercise that will hopefully re-inforce the concepts, before we deal with syntax.

Exercise: Identify which programs work, and which break

Without using any lifetime syntax, answer the following questions for each of the code examples:

  1. Which inputs are references? Which could the function return?
  2. Which examples could have dangling references?

NOTE: the code examples do not compile; you will need to read them and think about them.

Once you've decided your answers, the "eyeball" button in the top-right hand corner of the code block will reveal the answers.


// a is the only input reference.
// the only thing the function can return is a
fn identity(a: &i32) -> &i32 {
    a
}

// This does not have any dangling references.
fn example_1() {
    let x = 4;
    let x_ref = identity(&x);
    assert_eq!(*x_ref, 4);
}

// This is always going to cause a dangling reference.
fn example_2() {
    let mut x_ref: Option<&i32> = None;
    {
        let x = 7;
        x_ref = Some(identity(&x));
    }
    assert_eq!(*x_ref.unwrap(), 7);
}
// the contents of `opt` and `otherwise` are both references
// either of them could be returned.
fn option_or(opt: Option<&i32>, otherwise: &i32) -> &i32 {
    opt.unwrap_or(otherwise)
}

// No possibility for a dangling reference here.
fn example_1() {
    let x = 8;
    let y = 10;
    let my_number = Some(&x);
    assert_eq!(&x, option_or(my_number, &y));
}

// This is always a dangling reference.
fn example_2() {
    let answer = {
        let y = 4;
        option_or(None, &y)
    };
    assert_eq!(answer, &4);
}

// This is never a dangling reference.
fn example_3() {
    let y = 4;
    let answer = {
        option_or(None, &y)
    };
    assert_eq!(answer, &4);
}

// This is always a dangling reference.
fn example_4() {
    let y = 4;
    let answer = {
        let x = 7;
        option_or(Some(&x), &y)
    };
    assert_eq!(answer, &7);
}

Lifetimes: The Solution to Dangling References

What we can see here is that dangling references are caused by functions losing information about where references come from.

The compiler can only decide if the function is correct if it knows how the lifetimes of its inputs and outputs interact. So, we need to tell the compiler when lifetimes of inputs and outputs will be the same.

What we can do is tell the computer "my function works for any lifetimes, as long as the lifetimes of these inputs/outputs are the same". Let's take a look at the syntax of this:

fn some_if_greater<'lifetime1, 'lifetime2>(number: &'lifetime1 i32, greater_than: &'lifetime2 i32) -> Option<&'lifetime1 i32> {
    if number > greater_than {
        Some(number)
    } else {
        None
    }
}
fn main() {
    let (n, gt) = (7, 4);
    let test = some_if_greater(&n, &gt);
}

Let's walk through what this does:

  • fn my_function<'lifetime1, 'lifetime2>(...): what we're doing here is choosing some names for the lifetimes our program requires.
  • number: &'lifetime1 i32: this is us telling the compiler that this reference must live for some region of code called 'lifetime1.
  • greater_than: &'lifetime2 i32: this is us telling the compiler that this reference must live for some region of code called 'lifetime2. This means that the lifetimes of greater_than and number don't have to relate at all.
  • -> Option<&'lifetime1 i32>: this is where lifetimes are important. what we're saying is that number and our return value must be valid for exactly the same region of code.

So, what we've done is told the compiler that our function can only be called if number and the return are valid in the same region of code.

Exercise: Annotate lifetimes

Just to get some initial practice, the exercise in this section is to annotate lifetimes on some of the examples of the last two chapters.

You will need to:

  • decide how many lifetime parameters are necessary
  • name each of those lifetime parameters, and put them inside < angle brackets > after the function's name.
  • annotate every reference with the appropriate lifetime
  • check the code compiles
  • think about what region of code each lifetime could be

You will notice each function has the #[lifetimes_required(!)] annotation. You will need to leave it there to complete this exercise. This instructs the compiler to throw an error whenever you miss a lifetime; even if the compiler doesn't need the lifetime.

Why Lifetime Annotations Are Mostly Never Used

In the last chapter, we saw why we needed lifetimes. We saw that the compiler was unable to automatically tell how references in the arguments or return values might relate to each other. This is why we needed to tell the compiler that the references related to each other.

This said, you've probably written a function in Rust that needed a reference (likely a &str), without writing lifetimes. Why didn't you have to annotate lifetimes then? There are some common patterns in Rust that make it obvious to the compiler what the lifetimes should be. Let's explore some of them:

Example 1: No Output References

fn add(a: &i32, b: &i32) -> i32 {
    *a + *b
}

fn main() {
    assert_eq!(add(&3, &4), 7);
}

The lifetimes of a and b in this function don't need to relate to each other. Assuming there's only one thread, and assuming safe code, there's no way that the variable they are referencing could possibly be dropped during the function, and after the function, they can live for as long as they like.

Example 2: Only one reference in the input

fn identity(a: &i32) -> &i32 {
    a
}

fn main() {
    let x = 52;
    assert_eq!(&x, identity(&x));
}

It's important to note that it (generally 1 ) isn't possible to create a reference and pass it out of a function if it wasn't given to you. This is because a reference must refer to something you own. Anything you own is dropped at the end of your function. Therefore, anything you own can't be referenced; and the only way you can return a reference is if you were passed a reference.

For this reason, if you only have one reference in your parameters, the only reference you could return is that one -- so the lifetime of your parameter has to be the same as the reference you return.

What to do

Rust could have specifically encoded these examples as exceptions; but then there may have been many cases which were excepted and they would have ended up with confusing rules.

Instead, the Rust project has settled on a procedure the compiler will follow to try and guess lifetimes.

The compiler first splits all the references in a function signature into two types: 'input' and 'output'. 'Input' references are those in the parameters of the function (i.e. it's arguments). 'Output' references are those in the return type of the function.

The two rules that we'll learn in this chapter are:

  1. Each place that an input lifetime is left out (a.k.a 'elided') is filled in with its own lifetime.
  2. If there's exactly one lifetime on all the input references, that lifetime is assigned to every output lifetime.

Let's see how those rules affect the above two examples, and an example from the last chapter:

Example 1: No Output References

We had:

fn add(a: &mut i32, b: &mut i32) -> i32 {
    *a + *b
}

There are two input lifetimes, the ones needed in the types of a and b. Each get allocated their own lifetime:

fn add<'elided1, 'elided2>(a: &'elided1 i32, b: &'elided2 i32) -> i32 {
    *a + *b
}

fn main() {
    assert_eq!(add(&3, &4), 7);
}

There are no output lifetimes, so we end there.

This example is now correct -- the two lifetimes of a and b can be entirely unrelated; and the output is an owned value, so it doesn't depend on any lifetimes at all.

Example 2: Only one reference in the input

We had:

fn identity(a: &i32) -> &i32 {
    a
}

fn main() {
    let x = 52;
    assert_eq!(&x, identity(&x));
}

There is only one input lifetime (needed for the type of a):

fn identity<'elided1>(a: &'elided1 i32) -> &i32 {
    a
}

fn main() {
    let x = 52;
    assert_eq!(&x, identity(&x));
}

There is only one output lifetime; and all the input lifetimes share the same lifetime ('elided1); therefore we can allocate all output lifetimes that lifetime:

fn identity<'elided1>(a: &'elided1 i32) -> &'elided1 i32 {
    a
}

fn main() {
    let x = 52;
    assert_eq!(&x, identity(&x));
}

This now makes sense: the only possible way you could return a &i32 is if you got it from a parameter; and we can see that the input and output share a lifetime.

Example 3: The Limits of Elision

Let's have another look at this example from the last chapter.

fn max_of_refs(a: &i32, b: &i32) -> &i32 {
    if *a > *b {
        a
    } else {
        b
    }
}

Just like Example 1, there are two input lifetimes, so we give them distinct lifetimes:

fn max_of_refs<'elided1, 'elided2>(a: &'elided1 i32, b: &'elided2 i32) -> &i32 {
    if *a > *b {
        a
    } else {
        b
    }
}

But unlike Example 1, we need an output lifetime! By the second rule, we can only elide the output lifetime if we have exactly one input lifetime. Therefore, Rust considers it an error to elide lifetimes here -- the user has to give more information!

Exercise: Apply These Rules

In this exercise, there are four functions which are missing some lifetime annotations. Your task is to manually follow the lifetime elision rules, and give these functions lifetimes in the same way that the compiler would.

In a future release of lifetimekata, these will be checked automatically. For now, when you're done, compare your answer to the solutions.


1: this is possible with static types, like string literals, but we'll cover those later

Mutable References and Containers

Mutable References work exactly the same way as regular references, with regards to lifetime elision. The reason we have a chapter about them, however, is that if you have a mutable reference, you might need to tell the compiler about lifetimes even without a return value.

For example, let's take a look at this example:

fn insert_value(my_vec: &mut Vec<&i32>, value: &i32) {
    my_vec.push(value);
}

We're not returning anything; so lifetimes don't matter, right?

Unfortunately, lifetimes are still important. The reference value actually needs to live for the same time as the contents of the vector. If they didn't, the vector might contain an invalid reference. For example, what would happen in this scenario?

fn insert_value(my_vec: &mut Vec<&i32>, value: &i32) {
    my_vec.push(value);
}

fn main() {
    let x = 1;
    let my_vec = vec![&x];
    {
        let y = 2;
        insert_value(&mut my_vec, &y);
    }
    println!("{my_vec:?}");
}

The reference to y in the above example is dangling when we try to print the vector!

We can use lifetimes to ensure that the two references live for the same amount of time:

fn insert_value<'vec_lifetime, 'contents_lifetime>(my_vec: &'vec_lifetime mut Vec<&'contents_lifetime i32>, value: &'contents_lifetime i32) {
    my_vec.push(value)
}
fn main(){
    let mut my_vec = vec![];
    let val1 = 1;
    let val2 = 2;
    
    insert_value(&mut my_vec, &val1);
    insert_value(&mut my_vec, &val2);
    
    println!("{my_vec:?}");
}

This signature indicates that there are two lifetimes:

  • 'vec_lifetime: The vector we've passed the function will need to live for a certain period of time.
  • 'contents_lifetime: The contents of the vector need to live for a certain period of time. Importantly, the new value we're inserting needs to live for just as long as the contents of the vector. If they didn't, you would end up with a vector that contains an invalid reference.

Do We Even Need Two Lifetimes?

You might wonder what happens if we don't provide two lifetimes. Does just one lifetime work?

fn insert_value<'one_lifetime>(my_vec: &'one_lifetime mut Vec<&'one_lifetime i32>, value: &'one_lifetime i32) {
    my_vec.push(value)
}

fn main(){
    let mut my_vec: Vec<&i32> = vec![];
    let val1 = 1;
    let val2 = 2;
    
    insert_value(&mut my_vec, &val1);
    insert_value(&mut my_vec, &val2);
    
    println!("{my_vec:?}");
}

No, it doesn't. We get two errors. Let's look at the first one:

error[E0499]: cannot borrow `my_vec` as mutable more than once at a time
  --> /tmp/rust.rs:11:18
   |
10 |     insert_value(&mut my_vec, &val1);
   |                  ----------- first mutable borrow occurs here
11 |     insert_value(&mut my_vec, &val2);
   |                  ^^^^^^^^^^^
   |                  |
   |                  second mutable borrow occurs here
   |                  first borrow later used here

This seems strange -- why can't you borrow my_vec?

Well, let's walk through what the compiler sees:

&val needs to last for as long as my_vec exists:

fn insert_value<'one_lifetime>(my_vec: &'one_lifetime mut Vec<&'one_lifetime i32>, value: &'one_lifetime i32) {
    my_vec.push(value)
}

fn main(){
    let mut my_vec: Vec<&i32> = vec![];
    let val1 = 1;
    let val2 = 2;
    
    insert_value(&mut my_vec, &val1); // \
    insert_value(&mut my_vec, &val2); // | - &val1 needs to last this long.
                                      // |
    println!("{my_vec:?}");           // /
}

Whereas &mut my_vec only needs to last for the duration of insert_value.

fn insert_value<'one_lifetime>(my_vec: &'one_lifetime mut Vec<&'one_lifetime i32>, value: &'one_lifetime i32) {
    my_vec.push(value)
}

fn main(){
    let mut my_vec: Vec<&i32> = vec![];
    let val1 = 1;
    let val2 = 2;
    
    insert_value(&mut my_vec, &val1); // <- &mut my_vec only needs to last this long.
    insert_value(&mut my_vec, &val2); 
    
    println!("{my_vec:?}");
}

But, we've told the compiler that it needs the borrows of both &val1 and &mut my_vec to share the same lifetime. So the compiler extends the borrow of &mut my_vec to ensure they do share a lifetime: It sees that if it let &mut my_vec live as long as &val1, it would have that single region of code:

fn insert_value<'one_lifetime>(my_vec: &'one_lifetime mut Vec<&'one_lifetime i32>, value: &'one_lifetime i32) {
    my_vec.push(value)
}

fn main(){
    let mut my_vec: Vec<&i32> = vec![];
    let val1 = 1;
    let val2 = 2;
    
    insert_value(&mut my_vec, &val1); // \
    insert_value(&mut my_vec, &val2); // | - 'one_lifetime must be this region of code.
                                      // |
    println!("{my_vec:?}");           // /
}

And that's fine. But now the compiler gets to the next line, and it sees you're trying to borrow &mut my_vec again.

The compiler already decided &mut my_vec has to exist until the end of the function. So now, you're asking it to create two mutable references... and that's not allowed.

So the compiler throws an error -- you're not allowed to borrow &mut my_vec again.

Why does having two lifetimes fix this error?

Have a think before reading this section -- why does having two lifetimes solve this bug?

Before, the compiler had to decide that &mut my_vec and &val1 shared a lifetime. In other words, that they lived as long as each-other.

By using two lifetimes, we've told the compiler that &mut my_vec and &val1 don't necessarily have to live for the same amount of time. And so, it finds the following lifetimes:

fn insert_value<'vec_lifetime, 'contents_lifetime>(my_vec: &'vec_lifetime mut Vec<&'contents_lifetime i32>, value: &'contents_lifetime i32) {
    my_vec.push(value)
}

fn main(){
    let mut my_vec: Vec<&i32> = vec![];
    let val1 = 1;
    let val2 = 2;
    
    insert_value(&mut my_vec, &val1); // <- 'vec_lifetime \
    insert_value(&mut my_vec, &val2); //                  | 'contents_lifetime
                                      //                  |
    println!("{my_vec:?}");           //                  /
}

Exercise Part 1: The Other Error

First, let's look at the other error we got in the last section:

error[E0502]: cannot borrow `my_vec` as immutable because it is also borrowed as mutable
  --> /tmp/rust.rs:13:16
   |
10 |     insert_value(&mut my_vec, &val1);
   |                  ----------- mutable borrow occurs here
...
13 |     println!("{my_vec:?}");
   |                ^^^^^^
   |                |
   |                immutable borrow occurs here
   |                mutable borrow later used here
   |

Can you explain why this error occurs? Write it out in 50 words or less.

Exercise Part 2: Writing Our Own

Add appropriate lifetimes to the function in the exercise.

Lifetimes on Types

So far, we've only discussed lifetimes as applied to functions. Functions are not the only place where you will need explicit lifetimes. Types (structs and enums) can have lifetimes too.

This is because if a struct contains a reference, the user needs to clarify how long it lasts for.

Imagine we wanted to split a &str in two, and create a struct with a start and end field?

Well, we could write a function like this:

struct SplitStr {
    start: &str,
    end: &str
}

fn split<'text, 'delim>(text: &'text str, delimiter: &'delim str) -> Option<SplitStr> {
    let (start, end) = text.split_once(delimiter)?;
    
    Some(SplitStr {
        start,
        end
    })
}

fn main() {}

And we're done! Right?

Well, how long do those string references live?

What if we called the function like this:

struct SplitStr {
    start: &str,
    end: &str
}

fn split<'text, 'delim>(text: &'text str, delimiter: &'delim str) -> Option<SplitStr> {
    let (start, end) = text.split_once(delimiter)?;
    
    Some(SplitStr {
        start,
        end
    })
}

fn main() {
    let mut parts_of_string: Option<SplitStr> = None;
    {
        let my_string = String::from("First line;Second line");
        parts_of_string = split(&my_string, ";");
    }
    
    println!("{parts_of_string:?}");
}

Well, the references inside the SplitStr struct are now dangling, since they both pointed to my_string; but that only existed inside the curly brackets.

So, Rust forces us to specify the lifetime of all references inside a struct. Here's how we'd fix our code:

struct SplitStr<'str_lifetime> {
    start: &'str_lifetime str,
    end: &'str_lifetime str
}

fn split<'text, 'delim>(text: &'text str, delimiter: &'delim str) -> Option<SplitStr<'text>> {
    let (start, end) = text.split_once(delimiter)?;
    
    Some(SplitStr {
        start,
        end
    })
}

fn main() {}

Now, when we return an Option<SplitStr<'text>> the compiler knows that references inside the struct must all last for the same lifetime as 'text. If we try to return a SplitStr where the references can't last for 'text, that will be a compiler error.

A Note on Enums

References work exactly the same way in enums as they do in structs. We don't go into detail on them here because they are interchangeable.

enum StringOption<'a> {
    Some(&'a str),
    None
}
fn main() {}

Two Lifetimes

Occasionally, structs will have more than one lifetime on them. This happens when the data inside them comes from two different places, with two lifetimes.

Take the example of a program to find unique words among two sentences.

You might have the first sentence be "I love to swim and surf.", and the second be "I love to ski and snowboard.". The words unique to the first sentence are "swim" and "surf". The words unique to the second sentence are "ski" and "snowboard".

If you said that the two sentences had to share a lifetime, you would be forcing the user to ensure that the two sentences came from the same place, and therefore had the same lifetime. But what if one came from a file that was open for the whole running of the program, but the second was scanned in inside a loop?

In that case, the compiler would insist that the scanned in value was saved for the whole of the program, which would not be ergonomic.

Exercise: Two Lifetimes on a Struct

In this exercise, we will be modifying a small program which finds the unique words between two strings. At the moment, it does not have any lifetime annotations, and therefore does not compile.

Our goal is to return a struct that contains all the unique words from the first string, and all the unique words from the second string. They should have separate lifetimes.

Lifetimes on Impls

When structs or enums have lifetimes on them, the way that impl blocks work also changes slightly.

For example, say we want to create a struct which lets the user iterate over a sentence. You might start off something like this:

// First, the struct:

/// This struct keeps track of where we're up to in the string.
struct WordIterator<'s> {
    position: usize,
    string: &'s str
}

impl WordIterator {
    /// Creates a new WordIterator based on a string.
    fn new(string: &str) -> WordIterator {
        WordIterator {
            position: 0,
            string
        }
    }
    
    /// Gives the next word. `None` if there aren't any words left.
    fn next_word(&mut self) -> Option<&str> {
        let start_of_word = &self.string[self.position..];
        let index_of_next_space = start_of_word.find(' ').unwrap_or(start_of_word.len());
        if start_of_word.len() != 0 {
            self.position += index_of_next_space + 1;
            Some(&start_of_word[..index_of_next_space]) 
        } else {
            None
        }
    }
}

fn main() {
    let text = String::from("Twas brillig, and the slithy toves // Did gyre and gimble in the wabe: // All mimsy were the borogoves, // And the mome raths outgrabe. ");
    let mut word_iterator = WordIterator::new(&text);
    
    assert_eq!(word_iterator.next_word(), Some("Twas"));
    assert_eq!(word_iterator.next_word(), Some("brillig,"));
    
}

When defining our WordIterator struct, we said it requires a lifetime to be specified. But when we then wrote the impl block, we didn't specify one. Rust requires that we do this.

The way we do this is by telling Rust about a lifetime, and then putting that lifetime onto our struct. Let's see how we do that:

impl<'lifetime> for WordIterator<'lifetime> {
    // ...
}

It's useful to note that we've done this in two parts -- impl<'lifetime> defines a lifetime 'lifetime. It doesn't make any promises about what that lifetime is, it just says it exists. WordIterator<'lifetime> then uses the lifetime we created, and says "the references in WordIterator must live for lifetime".

Now, anywhere in the impl block, we can choose to use that lifetime. Any reference we annotate with 'lifetime' must have the same lifetime as any other reference annotated with 'lifetime'.

/// This struct keeps track of where we're up to in the string.
struct WordIterator<'s> {
    position: usize,
    string: &'s str
}

impl<'lifetime> WordIterator<'lifetime> {
    /// Creates a new WordIterator based on a string.
    fn new(string: &'lifetime str) -> WordIterator<'lifetime> {
        WordIterator {
            position: 0,
            string
        }
    }
    
    /// Gives the next word. `None` if there aren't any words left.
    fn next_word(&mut self) -> Option<&str> {
        let start_of_word = &self.string[self.position..];
        let index_of_next_space = start_of_word.find(' ').unwrap_or(start_of_word.len());
        if start_of_word.len() != 0 {
            self.position += index_of_next_space + 1;
            Some(&start_of_word[..index_of_next_space]) 
        } else {
            None
        }
    }
}

fn main() {
    let text = String::from("Twas brillig, and the slithy toves // Did gyre and gimble in the wabe: // All mimsy were the borogoves, // And the mome raths outgrabe. ");
    let mut word_iterator = WordIterator::new(&text);
    
    assert_eq!(word_iterator.next_word(), Some("Twas"));
    assert_eq!(word_iterator.next_word(), Some("brillig,"));
    
}

Lifetime Elision, Redux

We previously discussed two rules for lifetime elision. They are:

  1. Each place that an input lifetime is left out (a.k.a 'elided') is filled in with its own lifetime.
  2. If there's exactly one lifetime on all the input references, that lifetime is assigned to every output lifetime.

Now that we've seen impl blocks that have lifetimes, let's discuss one more:

  1. If there are multiple input lifetime positions, but one of them is &self or &mut self, the lifetime of the borrow of self is assigned to all elided output lifetimes.

This means that even if you take in many references in your arguments, Rust will assume that any references you return come from self, not any of those other references.

Exercise

In the following code, we annotate the function using the 'borrow lifetime, not only the 'lifetime lifetime. The 'borrow lifetime only exists inside this function, and only affects the borrows of its arguments and return value. The 'lifetime value, as we saw before, also constrains the lifetime of the string inside the struct.

There are four ways we could implement this code. Describe the effect of each of these implementations.

Specifically:

  • Do they compile?
  • Are any of them identical to another one?
  • Are there any circumstances where their lifetimes are not general enough?
  • Which would be the "most" correct to write?

Example 1

    /// Gives the next word. `None` if there aren't any words left.
   /// This compiles. It's the exact same as Example 4.
   /// This function is problematic because the next word lives as long
   /// as your borrow of the iterator. In order to get the next word, you
   /// must drop all references to the current one.
    fn next_word<'borrow>(&'borrow mut self) -> Option<&'borrow str> {
        // ...
    }

Example 2

    /// Gives the next word. `None` if there aren't any words left.
   /// This compiles. It's the exact same as Example 3.
    fn next_word<'borrow>(&'borrow mut self) -> Option<&'lifetime str> {
        // ...
    }

Example 3

    /// Gives the next word. `None` if there aren't any words left.
   /// This compiles. It's probably the "most" correct, because it's the shortest
   /// to write, but also ensures you can retain the returned strings, even if
   /// you call this function multiple times.
    fn next_word(&mut self) -> Option<&'lifetime str> {
        // ...
    }

Example 4

    /// Gives the next word. `None` if there aren't any words left.
   /// This compiles. If expanded, it would be the same as Example 1.
    fn next_word(&mut self) -> Option<&str> {
        // ...
    }

Special Lifetimes and Bounds

There are two special lifetimes in Rust. It's worth discussing them both:

  • 'static
  • '_ (the implicit lifetime)

The static lifetime

Some things in your program are guaranteed to last forever. The most common reason for this is when they're information bundled inside your binary. For example, when you write a program like this:

fn main() {
    let my_text = "Hello World";
}

The text "Hello World" is actually somewhere inside the compiled binary. This means that the reference to it is always valid, since the text is always there as long as the program is running.

Therefore, if we were to talk about the type of the text, we'd say it's a &'static str.

Similarly, any references to a constant can also be &'static. For example:


const SOME_COORDINATE: (i32, i32) = (7, 4);

fn main() {
    let static_reference: &'static (i32, i32) = &SOME_COORDINATE;
}

The '_ lifetime (Anonymous Lifetimes, Placeholder Lifetimes)

The implicit lifetime tells Rust to figure out the lifetime itself. There are three places where this lifetime is useful:

  • To simplify impl blocks
  • When consuming/returning a type that needs a lifetime
  • To write trait objects that contain references.

Simplifying Impl Blocks

Say you're implementing a counter struct, that looks like this:

struct Counter<'a> {
    counter: &'a mut i32
}

impl<'a> Counter<'a> {
    fn increment(&mut self) {
        *self.counter += 1;
    }
}

fn main() {
    let mut num = 0;
    
    let mut counter = Counter { counter: &mut num };
    counter.increment();
    
    println!("{num}"); // prints 1
    
}

That's fine, but you'll notice that the impl block doesn't actually use the 'a lifetime anywhere. Therefore, we can simplify things by writing the following instead:

impl Counter<'_> {
    fn increment(&mut self) {
        self.counter += 1;
    }
}

The two impl blocks above mean the same thing, but just take slightly fewer arguments.

Returning Structs and Enums

This is recommended for the situation where you are returning a struct/enum that contains a reference. You could write something like this:


struct StrWrap<'a>(&'a str);

fn make_wrapper(string: &str) -> StrWrap {
    StrWrap(string)
}

fn main() {}

But that syntax is no longer recommended, as you will see when you add the #![deny(rust_2018_idioms)] annotation, where you get the error:

error: hidden lifetime parameters in types are deprecated
 --> src/main.rs:8:34
  |
_ | fn make_wrapper(string: &str) -> StrWrap {
  |                                  ^^^^^^^ expected lifetime parameter
  |
note: the lint level is defined here
 --> src/main.rs:1:9
  |
_ | #![deny(rust_2018_idioms)]
  |         ^^^^^^^^^^^^^^^^
  = note: `#[deny(elided_lifetimes_in_paths)]` implied by `#[deny(rust_2018_idioms)]`
help: indicate the anonymous lifetime
  |
_ | fn make_wrapper(string: &str) -> StrWrap<'_> {
  |                                         ++++

By following the hint, it becomes clearer that StrWrap does contain a reference, but that the compiler should just figure it out.

Lifetimes on Trait Objects

See Chapter 10: Footnote on Trait Lifetime Bounds for the gory details.

Lifetime Bounds

Lifetime bounds are not widely used, so we don't devote a large section of these exercises to them. You can probably skip this section unless you really want to know the details.

In short, they allow you to specify that one lifetime should outlive another. To specify one, use a where clause, such as where 'a: 'b.

To quote the Rust Reference:

Lifetime bounds can be applied to types or to other lifetimes. The bound 'a: 'b is usually read as 'a outlives 'b. 'a: 'b means that 'a lasts at least as long as 'b, so a reference &'a () is valid whenever &'b () is valid.

fn f<'a, 'b>(x: &'a i32, mut y: &'b i32) where 'a: 'b {
    y = x;                      // &'a i32 is a subtype of &'b i32 because 'a: 'b
    let r: &'b &'a i32 = &&0;   // &'b &'a i32 is well formed because 'a: 'b
}

T: 'a means that all lifetime parameters of T outlive 'a. For example, if 'a is an unconstrained lifetime parameter, then i32: 'static and &'static str: 'a are satisfied, but Vec<&'a ()>: 'static is not.

Exercise

You have been given code which contains many uses of the lifetimes 'a and 'b'. All of these lifetimes can be replaced with either '_ or 'static.

Your task is to replace every occurance of the lifetimes 'a and 'b with either '_ or 'static, to remove excessive lifetime declarations, and to ensure your code still compiles.

Footnote on Out of Date Information

The Rust Edition Guide previously contained a section about anonymous lifetimes. The most popular google result is now this article but I recommend disregarding it, as it is out of date information.

Finale

Congratulations on getting to the end of LifetimeKata. There's more footnotes and extra reading in the next two chapters, but this chapter is a "finale" of sorts.

In this exercise, we will be building a very simple clone of a glob system. This allows someone to ask whether a piece of text matches some description.

It's worth noting that implementing the whole thing may take up to an hour. If you just want to work on the lifetimes, you can copy code from the solution, but this is a fun and rewarding exercise to complete in its entirety.

For example, the glob ab(cd|ef|gh) matches any of the following strings: abcd, abef, abgh

You will create a Matcher struct, which has three fields:

  • A &str, representing the textual representation of our regex.
  • A Vec<MatcherTokens>, representing the different parts of the regex, in order.
  • An integer, keeping track of what the longest match for our regex was.

To create this, you'll take a string that looks like this: hello.(town|world|universe).its.me. There are three components to this:

  • regular text, like 'hello', 'its' or 'me', which should only match that exact text
  • wildcards (the . character), which matches any single character.
  • optional text, like (town|world|universe), which matches exactly one of a list of strings. So (town|world|universe) matches town, OR world, OR universe.

These can be mixed and matched in any order (but you will never have one inside the other). With this string, you should create a vector of MatcherTokens which refer to the relevant parts of that string.

You will then write a function which takes another string, and sees how much of the Matcher that particular string matches. You will return a vector of (MatcherToken, &str), where the MatcherToken is the token that matched some text, and the &str is the text that was matched.

An Example

Say you had the matcher (Black|Bridge)(rock|stone|water).company. This can be broken down into four parts:

  • OneOfText(["Black", "Bridge"])
  • OneOfText(["rock", "stone", "water"])
  • Wildcard
  • RawText("company")

Now, let's imagine we're given the following text: BlackBridge. Black matches the first token, but Bridge does not match the second token. So, we would return: vec![(OneOfText(["Black", "Bridge"]), "Black")"]. The most tokens we've matched is 1.

For a different example, take Bridgestone_Tyres. Bridge matches the first matcher, stone matches the second matcher, _ matches the third matcher, but Tyres doesn't match company. So the most tokens we've matched is 3. We'd return a vec containing:

  • (OneOfText(["Black", "Bridge"]), Bridge)
  • (OneOfText(["rock", "stone", "water"]), "stone")
  • (Wildcard, "_")

A Note On Unicode

Rust is able to deal with unicode characters (like emoji or Japanese Kanji) in its strings. Of course, this increases the amount of complexity that is required for simple operations like splitting a string into pieces, because it's possible to accidentally split a character in half.

The tests in the example do not use unicode, however if you want a "true" Rust experience, change the tests to include a unicode character (an example is in the comments).

Further Reading in the Rust Reference

The best sources of information about lifetimes are the Rust Reference and Rustonomicon. If you require a really advanced knowledge of lifetimes to complete a project, the reference will contain that knowledge. Most times, however, if you feel you need to understand something complex about lifetimes, you may find that there is a simpler alternative.

Other Useful Lifetimes Content

Variance and Subtyping

This guide does not cover the topic of "variance" at all, which is how lifetimes can be substituted for one-another. Variance, while theoretically important, is not useful in a day-to-day understanding of lifetimes, and so it was not included in the book.

You can read more about it in the Rustonomicon (subtyping).

Brain Teaser 1: Why doesn't this program work:

If you're interested in working through a really difficult exercise to test your understanding of lifetimes and generics, the following exercise may be interesting.

This should be another way of implementing the code in exercise 5. Unfortunately, it doesn't work. This took the author of this book 20 minutes to figure out (after having written five chapters), so he challenges you to do better!

use std::collections::HashSet;

struct Difference<'first, 'second> {
    first_only: Vec<&'first str>,
    second_only: Vec<&'second str>
}

fn find_difference<'fst, 'snd>(sentence1: &'fst str, sentence2: &'snd str) -> Difference<'fst, 'snd> {
    let sentence_1_words: HashSet<&str> = sentence1.split(" ").collect();
    let sentence_2_words: HashSet<&str> = sentence2.split(" ").collect();

    Difference {
        first_only: (&sentence_1_words - &sentence_2_words).into_iter().collect(),
        second_only: (&sentence_2_words - &sentence_1_words).into_iter().collect(),
    }

}

fn main() {
    let first_sentence = String::from("I love the surf and the sand.");
    let second_sentence = String::from("I hate the surf and the sand.");

    let first_only = {
        let third_sentence = String::from("I hate the snow and the sand.");
        let diff = find_difference(&first_sentence, &third_sentence);
        diff.first_only
    };

    assert_eq!(first_only, vec!["hate", "surf"]);

    let second_only = {
        let third_sentence = String::from("I hate the snow and the sand.");
        let diff = find_difference(&third_sentence, &second_sentence);
        diff.second_only
    };

    assert_eq!(second_only, vec!["snow"]);
}

For more information about this issue, read this Rust issue.

Footnote: Lifetimes on Trait Objects

In chapter 7, we discussed placeholder lifetimes ('_). We said that there were three ways you could use them:

  • To simplify impl blocks
  • When consuming/returning a type that needs a lifetime
  • To write trait objects that contain references.

In the first case, we saw that anonymous lifetimes just simplified what we needed to write.

In the second case, we saw that Rust recommends that we use it, but we don't need to -- lifetime elision will do what we want.

The one case where it looks like lifetime elision should do what we want, but it actually doesn't unless we use the '_ is the case of trait objects. This chapter walks through how trait objects and lifetimes work together.

Let's setup a simple example:

trait Bool {
    fn truthiness(&self) -> bool;
}

struct True();
impl Bool for True {
    fn truthiness(&self) -> bool {
        true
    }
}

struct False();
impl Bool for False {
    fn truthiness(&self) -> bool {
        false
    }
}

fn get_bool(b: bool) -> Box<dyn Bool> {
    if b == true {
        Box::new(True())
    } else {
        Box::new(False())
    }
}

fn main() {
    let my_bool = true;
    let my_dyn = get_bool(my_bool);

    println!("{}", my_dyn.truthiness());
}

To be clear, what we are doing here is creating two structs which represent true and false. They both implement the Bool trait, which has the truthiness function which returns true or false.

The get_bool function returns a Boxed Bool trait object, based on whether get_bool is passed true or false.

It's important to realise that since trait objects might or might not contain a reference (or any number of references), all trait objects have lifetimes. This is true, even if no implementors of the trait contain references.1

So, since we need to associate a lifetime with our trait object, we might think we could rely on lifetime elision. But how would lifetime elision work for our get_bool function? There are no input references, so what output lifetime should we give the trait object? Lifetime elision can't help us here.

So, in RFC 599 and in RFC 1156, the rules for trait object lifetimes were changed. The rules are complex, and best outlined in the reference, but in the case of get_bool, it means that the lifetime inferred for dyn Bool is 'static.

Let's change the example slightly now, such that the struct contains a reference to a bool:

trait Bool {
    fn truthiness(&self) -> bool;
}

// CHANGE 1: added &'a bool here
struct True<'a>(&'a bool);
impl<'a> Bool for True<'a> {
    fn truthiness(&self) -> bool {
        true
    }
}

// CHANGE 2: added &'a bool here
struct False<'a>(&'a bool);
impl<'a> Bool for False<'a> {
    fn truthiness(&self) -> bool {
        false
    }
}

fn get_bool(b: &bool) -> Box<dyn Bool> {
    if *b == true {
        Box::new(True(b))
    } else {
        Box::new(False(b))
    }
}

// CHANGE 3: Update the 
fn main() {
    let my_dyn = {
        let my_bool = true;
        get_bool(&my_bool)
        // my_bool is dropped here, so the trait object we're returning
        // has a dangling reference.
    };
    println!("{}", my_dyn.truthiness());
}

Now, we get an error:

error: lifetime may not live long enough
  --> src/main.rs:22:5
   |
21 |   fn get_bool(b: &bool) -> Box<dyn Bool> {
   |                  - let's call the lifetime of this reference `'1`
22 | /     if *b == true {
23 | |         Box::new(True(b))
24 | |     } else {
25 | |         Box::new(False(b))
26 | |     }
   | |_____^ returning this value requires that `'1` must outlive `'static`
   |
help: to declare that the trait object captures data from argument `b`, you can add an explicit `'_` lifetime bound
   |
21 | fn get_bool(b: &bool) -> Box<dyn Bool + '_> {
   |                                       ++++

error: could not compile __ due to previous error

Even though lifetime elision means that get_bool should end up with a signature like fn get_bool<'elided>(b: &'elided bool) -> Box<dyn Bool + 'elided>, it doesn't. The special rules for trait objects mean that the lifetime is: fn get_bool<'elided>(b: &'elided bool) -> Box<dyn Bool + 'static>. That 'static bound is incorrect.

Therefore, we need the '_ bound (as this error message tells us) to inform Rust that it should use the normal lifetime elision rules; rather than the special trait object rules.


1: https://doc.rust-lang.org/reference/types/trait-object.html#trait-object-lifetime-bounds