Home

Modern C++: Build Custom Iterators

Written: July 27, 2024

Creating a custom iterator in C++ can transform the way you handle sequences in your code. In this post, I'll guide you through the process of writing an iterator that aligns with modern C++ practices, enhancing both performance and readability. Let's dive into the world of iterators and see how they can make your code more efficient and elegant!

Practical example

In my recent work on legacy projects, I've faced the challenge of parsing packets based on company-specific protocols. These protocols, reminiscent of familiar ones like TCP/IP, UDP, and Bluetooth, feature a fixed-size header and a variable-sized payload. However, the lack of dedicated parsing libraries in these projects have often made the code tricky to handle. In this article, I'll craft an iterator that leverages modern C++ practices to simplify the problem of parsing such packets.

Packets structure

Let's kick things off by defining the core problem. We have a collection of contiguous bytes in a container, and a protocol to separate these bytes into distinct packets. In C++ terms, these packets can be represented as a range of std::spans, or views of contiguous data, like this:

Packets goal

With this in mind, one elegant design could for example enable us to use the iterator in a range-based for loop like this:

for (std::span<std::byte> packet : packet_iterator{std::span{buffer}, protocol})
{
    // Process packets
}

This way, the iterator's sole responsibility is to provide views of the packets from the original buffer, separating the task of identifying packets from the actual processing. This design also avoids unnecessary copies of the original bytes and allows for potential concurrent processing of the packets, if you where to use f.ex. std::reduce(). But how do we achieve this interface? First and foremost, we need to write an iterator.

Writing a forward iterator

A great starting point for writing an iterator is to define the boilerplate for the type of iterator you need. In this case, a forward iterator is appropriate since the protocols usually don't allow parsing packets backwards. While it’s possible to create a bidirectional iterator by storing the sizes of previous packets, it would still rely on using a forward iterator under the hood. Therefore, we’ll define a class that meets the minimum requirements of an std::forward_iterator. We'll also make the underlying data types a template parameter since the specific types of the container are not important.

template<typename T>
class packet_iterator
{
public:
    using difference_type = std::ptrdiff_t;
    using value_type = std::span<T>;

    std::span<T> operator*() const;

    packet_iterator& operator++();

    packet_iterator operator++(int)
    {
        auto tmp{ *this };
        ++*this;
        return tmp;
    }

    bool operator==(const packet_iterator&) const;
}

Next, we need to define the essential states for our iterator. First, it needs a pointer to the start of the current packet. The iterator also needs to know the size of the current packet to handle increments and return the correct spans. This size is found in the header of the current packet and will be specific to a user-provided protocol. This protocol can be passed as a function object that takes a pointer to the start of the packet and the size of the remaining buffer (to handle incomplete packets) and returns the size of the next packet or potentially zero. Finally, we’ll store the current packet size to avoid repeatedly calling the function object. These states are illustrated in the following figure.

Packet implementation

Adding the states to the iterator, and implementing the constructors will give us this:

template<typename T, typename F>
    requires std::is_invocable_r_v<std::size_t, F, std::span<T>>
class packet_iterator
{
public:
    packet_iterator() = default;
    packet_iterator(std::span<T> buffer, F func)
        : buffer(buffer)
        , get_size_func(std::move(func))
        , size(get_size_func(buffer)) 
    {}

    ...
        [Member functions]
    ...

private:
    std::span<T> buffer;
    F get_size_func;
    std::size_t size;
}

Next, we implement the member functions. The indirect operator will return a subspan of the remaining buffer, like this:

std::span<T> operator*() const
{  
    return buffer.subspan(0, size);
}

The pre-increment operator will adjust the remaining buffer to skip over the previous packet and update the size for the next packet:

packet_iterator& operator++()
{  
    buffer = buffer.subspan(size);
    size = get_size_func(buffer);
    return *this;
}

The equal-to operator will simply compare iterators by checking if their remaining buffer points to the same address:

bool operator==(const packet_iterator&) const;
{
    return buffer.data() == rhs.buffer.data();
}

At this stage, we have a working forward iterator, but how do we determine if it has reached the end? Simply incrementing it until it returns an empty span isn’t ideal and doesn't work if we want to use STL functions or range-based for loops. What we need next is a sentinel.

Defining the sentinel

How do we create a sentinel that knows when its paired iterator is past the end? First, we should consider whether the iterator itself can determine if it’s reached the end. In this case, it can—since it keeps track of the remaining buffer. If the buffer is empty, the current size is zero, or the size exceeds the remaining buffer, it’s safe to consider it past the end. To handle this, we can use the STL’s std::default_sentinel_t, a convenient empty struct for bound checks. You could choose any type, but this type is as efficient as it gets, and its name is self-explanatory. To make this work, we need to add an additional equal-to operator that takes our chosen sentinel as its argument and returns whether the iterator has reached the end.

bool operator==(const std::default_sentinel_t&) const;
{
    return buffer.empty() || size == 0 || size > buffer.size();
}

Activate STL range abilities

Now that we have our iterator-sentinel pair, we can use it with STL iterator algorithms. However, to make it compatible with STL range functions and range-based for loops, we need to provide the iterator-sentinel pair from a begin() and end() function though our range object. In this case we will provide them directly though our iterator as member functions. With these additions, it will seamlessly fulfill the std::forward_range concept—and we have reached our initial goal.

packet_iterator begin() const
{
    return *this;
}

std::default_sentinel_t end() const
{
    return {};
}

However, there is one more enhancement we should add to our iterator. Since the iterator doesn't own the data it iterates over, we can enable the std::ranges::enable_borrowed_range variable template. This allows other functions to verify that they can take the range by value and return iterators obtained from it without the risk of dangling references.

template<typename T, typename F>
inline constexpr bool std::ranges::enable_borrowed_range<packet_iterator<T, F>> = true;

Use case example

Consider the following simple protocol, where the size of the packet is found in the first int of the packet:

Packet example

The iterator can simply be used as follows to loop the packets:

std::vector<int> buffer{ 2, 0, 4, 1, 0, 0, 3, 0, 8 };

auto protocol{[](std::span<int> buf) -> std::size_t
    {
        return buf.empty() ? 0 : buf.front();
    }};

for (std::span<int> packet : packet_iterator{std::span{buffer}, protocol})
{
    // Process packets
    // (1) [2, 0]
    // (2) [4, 1, 0, 0]
    // (3) [3, 0, 8]
}

Alternatively, you could wrap the protocol inside function and return the range like this:

auto my_packet_iterator(std::span<int> buffer)
{
    return packet_iterator{buffer, [](std::span<int> buf) -> std::size_t
        {
            return buf.empty() ? 0 : buf.front();
        }};
}

In that case, the code could look as slick as this:

std::vector<int> buffer{ 2, 0, 4, 1, 0, 0, 3, 0, 8 };

for (std::span<int> packet : my_packet_iterator(std::span{buffer}))
{
    // Process packets
    // (1) [2, 0]
    // (2) [4, 1, 0, 0]
    // (3) [3, 0, 8]
}

Summary

By crafting a custom iterator and sentinel, we've enhanced our code to handle sequences with ease and elegance. This approach not only aligns with modern C++ practices but also ensures efficient and clean code management. Whether you’re dealing with legacy projects or new challenges, a well-designed iterator can be a powerful tool in your toolkit. Enjoy writing elegant C++!

(you can find the source code for the final iterator in my [repo])

Home