jeffa.io

An Auto-Increment Crate for Rust

…now exists! Check it out.

The Problem

I was recently working on a Rust program and needed a way to give instances a unique identifier. These IDs did not need to be universally unique. I wanted the simplest way to recognize two instances as unique even if they were identical aside from the ID. A serial (or auto-increment) ID seemed like an appropriate choice.

As a rule, I try to write my own code to avoid over-reliance on dependencies. But this case seemed to warrant a dependency. It would be a simple thing to implement. What I wanted was essentially a counter, after all. I expected to find a reliable crate with lots of downloads.

Surprisingly, no such crate existed. I tried searching for “serial”, “increment”, “auto increment”, “serial id”, etc. and found nothing. One crate had a promising name but no documentation and, from what I could tell from the code, did not quite do what I was looking for.

I decided to write and publish such a crate myself. It seemed like a good opportunity to contribute a small but useful crate with relatively simple code to the greater Rust ecosystem.

The Solution

Are Universally Unique IDs a Universal Solution?

Serial IDs are used as identifiers because they are simple. Universally unique identifiers (or UUIDs), on the other hand, are usually used when a program needs to create identifiers without access to information about IDs created elsewhere. This is why a distributed database would use them. Rust developers have access to the rand crate and the uuid crate, which uses rand internally but creates IDs in formats that conform to formal standards. The basic strategy is to create a pseudo-random value or to capture a value (such as a UTC timestamp) and hash it. The details of UUID generation are too extensive to get into here. The uuid crate is both well-documented and cleanly coded, and there are countless other descriptions of the standards it implements.

The important difference between serial IDs and UUIDs is the performance cost. Hashing and random number generation are more expensive than increasing the value of an integer by one. Even if threads have to wait to get exclusive access to an ID generator, each thread will only hold it for the number of nanoseconds that it takes to calculate “x + 1”. Serial IDs are also free from the issue of collisions, allowing them to be smaller. A 32-bit serial ID can have 4,294,967,296 unique instances. There is no probability to consider.

UUIDs are always good enough, but their cost is not always necessary. Git, for example, creates an SHA-1 has of a commit’s content to create a commit ID. The IDs need to be unique when commits from different machines are pushed to a single remote. But many IDs do not need a high probability of being universally unique. CLI applications or other client-side programs that only work with in-memory values or store data locally are such examples.

How Has This Been Implemented Before?

Serial IDs are familiar to users of database systems. In SQL, auto-increment integers can be used by setting GENERATE BY DEFAULT AS IDENTITY as the default value for a column.

CREATE TABLE distributors (
     did    integer PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
     name   varchar(40) NOT NULL CHECK (name <> '')
);

Source

Postgres has its own SERIAL type that mimics this behavior.

CREATE TABLE tablename (
   colname SERIAL
);

Source

Postgres provides a reasonable standard to replicate: 32-bit integers by default with support for any other unsigned integer. By using a trait to define the types that can be generated, users can make their own types compatible with the API provided by the crate. My crate just needs to provide the generator, the trait and implementations for unsigned integers in the standard library.

Goals

Takeaway

This process proved something about Rust that wasn’t readily apparent to me before. Encouraging flexibility by defining behavior (through traits) makes it intuitive to implement new features. When you know what you want to do the code starts to seem obvious. If you need a single type that can output many types, define the common behavior of those types and make the single type operate based on that shared behavior.

Rust is often described as flexible, but many languages are flexible. With Rust, it feels like the API writes itself once you clearly define what it should do. The code is not just expressive, it’s obvious.

I also realized that there are still plenty of opportunities to contribute Rust code. Both widely-applicable and more niche libraries have provided the tools that developers need to adopt the language. But while Rust has become mature enough to fit many use cases, the ecosystem is far from bloated. There is still plenty of space to do new or different things and make an impact.

The serial_int crate is on crates.io and lib.rs. Contributions and feedback on the API are welcome!