jeffa.io

Rust Guide: Generics Demystified Part 2



Rust Guide: Generics Demystified Part 2

Introduction

Welcome to part two!

Our code from part one left us with an almost-cool library that scratched the surface of the power of generics. This section will cover implementations and traits.

Near the end of part one, we used the Fn trait, but we used it with dyn inside of a Box. The dyn keyword means we’re using an under-the-hood runtime abstraction and Box indicates heap allocation. Rust is all about compile-time goodness, so we will do away with Box<dyn Fn<T>> as we dig deeper into the most powerful parts of the type system.

Before that, and with the simple techniques from part one out of the way, let’s make sure we’re on the same page about what we’re studying and why it’s important.

Why Do We Need More Complexity?

Looking at our program from part one, we’ve decided that a Point can have absolutely any Id, a Line must have exactly two Points, and a Connector can do whatever the library user wants at runtime as long as it does it with two Points and produces a single Line. Our main function used the debug formatter to print our Line instances to the console. But what if we want to guarantee that every Connector creates a line that can be displayed with the Debug formatter (i.e."{:?}")? How can we be selective enough to make sure that this always works? And what if we want to allow it to work without forcing the use of Debug types? This is what we mean by selective flexibility.

We could write struct Line<Id: Debug>. This trait bound would restrain the Id type to one that implementes the Debug trait. But there are compelling reasons to make our type definitions as simple as possible. Most importantly, unless all of the fields are public, we won’t be able to construct the type from outside of its module. Therefore we cannot do anything with it at all without using at least one associated function, hence the need for a new method. This is why trait bounds are almost always written in the impl block and not in the type definition.

The std::iter module has many types that are designed to hold a closure, but to see what kind of closure it is, we have to look at the relevant impl (such as this one for Map). When you’re reading the standard library code to find out how to do something, you know you’re doing it right. But be sure to work incrementally and to always understand what you’re writing. We’ll go through that process now.

Implementations

A type’s definition (struct or enum) describes what it is. Its implementations (impl) describe what it can do. When we parametize the type and restrict it in the impl, we describe what the type can do when that impl’s conditions are met.

Let’s start by doing what was discussed above: allowing Line to be Debug-able when Id is debuggable.

use std::fmt::{self, Debug, Formatter};

// Remember to remove `#[derive(Debug)]` from `Line`.
impl<Id: Debug> Debug for Line<Id> {
    fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result {
        f.debug_struct("Line").field("points", &self.points).finish()
    }
}

It’s important to know that the type is still flexible enough to hold non-Debug types. This implementation merely says, “Line can use the method in Debug only when the Id is able to use it as well.” Whereas struct Line<Id: Debug> would say that a Line cannot exist unless Id implements Debug. We just did what #[derive(Debug)] does behind the scenes. It’s not magic, it’s just creative coding!

Going forward, we should clarify what our goals are. The great thing about Rust’s combination of selective flexibility and explicitness is that when the tools are understood and the goals are clear, the code practically writes itself.

We will make Connector into an iterator so that it can create Line instances from collections of Point instances. Later, we will make it possible to create any kind of shape, not just lines, using of any number of points, not just two. But we will get there incrementally so that we fully understand our code.

By the end of part two, our library’s user will put some Points into any collection (not just a Vec) and call a method. They will let it know how to combine the points with a closure. This is similar to how the standard library turns iterators into collections using the collect method, or changes the types and/or values in the iterator using the map method.

We’ll start by making our iterator-to-be look more like the iterators found in the standard library. This means replacing Box<dyn Fn> with an unbounded generic type and specifying a type that implements the Fn trait in the impl block. As usual, the standard library provides the best examples of how to do things right.

struct Connector<Id, F> {
    points: Vec<Point<Id>>,
    op: F,
}

impl<Id, F> Connector<Id, F>
where
    F: Fn(Point<Id>, Point<Id>) -> Line<Id>,
{
    fn new(points: Vec<Point<Id>>, op: F) -> Self {
        Connector { points, op }
    }
}

Nothing has really changed except that our code is a little cleaner and is likely more performant. We reduced the runtime overhead, which is nice. The user still cannot create a Connector with the wrong types because they have to use new, which is in an impl block with the trait bound. If we later find that we only call Connector::new in similar impl blocks, we could remove this particular trait bound. Note that the standard library uses pub(in crate::iter) to make sure that Map::new is not misused somewhere else. Now let’s talk about the Vec used for the points field.

This is not the way it is done in the standard library. This iterator actually owns the Vec. Not only does this impose the unneccesary restriction of using the Vec type instead of some other collection, it takes ownership of the collection itself, not just its contents. This is a preliminary step to make sure that we understand what we’re doing. The standard library’s use of generic traits in the iter module is masterful, but easy to mimic. This step makes sure that we have the correct behavior before we move on to creating the exact level of flexibility we want.

impl<Id, F> Iterator for Connector<Id, F>
where
    F: Fn(Point<Id>, Point<Id>) -> Line<Id>,
{
    type Item = Line<Id>;

    fn next(&mut self) -> Option<Self::Item> {
        let point_a = self.points.pop()?;
        let point_b = self.points.pop()?;
        let line = (self.op)(point_a, point_b);

        Some(line)
    }
}

Traits

A trait is a type system abstraction in itself. Combining our own trait with trait-bounded generics is an essential step to mastery.

Let’s start by using a trait to implement a method on the Vec type. This will look similar to calling map or filter on an iterator. It is considered a “best practice” to name a trait with one method after that method. So our one-method trait will simply be called Connect. This trait will replace our connect function.

When we write impl<T, U> we are declaring which generic types are in scope for that impl block. We must pass those generic types into the trait as type parameters. Looking at the trait, we see why: these type paramaters tell the compiler what types to expect in the trait’s method. Now the compiler knows that our implementation uses the connect method correctly.

trait Connect<Id, F>
where
    F: Fn(Point<Id>, Point<Id>) -> Line<Id>,
{
    fn connect(self, op: F) -> Connector<Id, F>;
}

impl<Id, F> Connect<Id, F> for Vec<Point<Id>>
where
    F: Fn(Point<Id>, Point<Id>) -> Line<Id>,
{
    fn connect(self, op: F) -> Connector<Id, F> {
        Connector::new(self, op)
    }
}

fn main() {
    let points = vec![Point { id: 1001 }, Point { id: 1002 }];
    let mut connector = points.connect(|left, right| Line {
        points: (left, right),
    });

    print!(
        "This should be a line: {:?}.\nThis should be nothing: {:?}.",
        connector.next(),
        connector.next()
    )
}

Run what we have so far in the playground.

Now we will finish writing our Connector iterator. We will create it by calling connect on another iterator (not just a Vec). Let’s make a few observations about similar types from the standard library and apply those observations to our code.

  1. The Map type has an extremely flexible type definition, one with unbounded generics. The new method also has no bounds, but it is only available within std::iter.

    Let’s apply this to our Connector.

     struct Connector<I, F> {
         points_iter: I,
         op: F,
     }
    
     impl<I, F> Connector<I, F> {
         fn new(points_iter: I, op: F) -> Self {
             Connector { points_iter, op }
         }
     }
  2. In general, the std::iter types call next on their inner iterator and simply alter its behavior. Many of them, such as Take::next, do so with very simple code.

    The standard library also teaches us that this impl block is where we should impose trait bounds, since this is where they are strictly necessary. Note that the Iterator type has an associated type called Item. The syntax <Item = Something> differentiates it from a generic type.

    This replaces the old code where we simply grabbed items from a Vec. This will work with any type that can create an iterator where the Item is Point<Id>.

     impl<Id, I, F> Iterator for Connector<I, F>
     where
         I: Iterator<Item = Point<Id>>,
         F: Fn(Point<Id>, Point<Id>) -> Line<Id>,
     {
         type Item = Line<Id>;
    
         fn next(&mut self) -> Option<Self::Item> {
             let point_a = self.points_iter.next()?;
             let point_b = self.points_iter.next()?;
             let line = (self.op)(point_a, point_b);
    
             Some(line)
         }
     }
  3. Iterators have methods that yield other iterators. If one example is not enough, there are dozens of others to be found in the Iterator trait.

    First we need to change the Connect trait because its method returns a Connector, which has already been changed. We rewrote Connector to be more flexible, so let’s do the same with Connect. We will continue the pattern of moving trait bounds out of definitions and into implementations.

     trait Connect<I, F> {
         fn connect(self, op: F) -> Connector<I, F>;
     }

    We cannot add a method to Iterator, it’s not part of our code. But we can use an extremely useful trick. We will implement our Connect method for every type that implements Iterator.

    impl<I, Id, F> Connect<Id, F> for I
    where
        I: Iterator,
    {
        fn connect(self, op: F) -> Connector<I, F> {
            Connector::new(self, op)
        }
    }

    This is called an automatic implementation. The standard library uses them a lot. A well-known example is that implementing Display for a type gives it an automatic implementation of ToString.

    The Fn trait bound has also been removed because Connect and Connector no longer need it. Thanks to selective flexibility, we have elegantly made our code more simple and more powerful. Let’s look at our new code in action.

fn main() {
    let points = {
        let mut map = std::collections::HashMap::new();

        map.insert("A", Point { id: 1001 });
        map.insert("B", Point { id: 1002 });
        map.insert("C", Point { id: 1003 });
        map.insert("D", Point { id: 1004 });
        map.insert("E", Point { id: 1005 });

        map
    };

    let mut connector = points.into_values().connect(|left, right| Line {
        points: (left, right),
    });

    println!("This should be a line: {:?}.", connector.next());
    println!("This should be a line: {:?}.", connector.next());
    println!("This should be nothing: {:?}.", connector.next());
}

Run what we have so far in the playground.

Our code does much more than it did back when we manually implemented Connect for Vec, yet we used the exact same number of lines! Imagine the amount of code needed for manual implementations of Connect for every type in std::collections. Thanks to the sophistication of generics, we get the same thing with just one block of code and let the compiler fill in the blanks.

Conclusion

Part two was full of important observations that might take a little time and experimentation to set in. Thankfully, this was the hard part. I hope this made you feel condifent that you can use type parameters in your own code. But type paramaters are just one part of generics. Now that types are understood, it won’t be hard to branch out to the final to pieces of the topic.

In the part three we will work with const generics and parametized lifetimes.

If you have any questions or comments about this guide, contact information is on the homepage. I appreciate any and all feedback.

Part three is a work-in-progress. Check back soon!