Unzipping a Collection of Tuples with the “unzip” Smart Output Iterator

Published February 19, 2019

Smart output iterators are output iterators that do more than just sending a piece of data from an STL algorithm to a container. They can embed logic that relieves the algorithm of some of its responsibilities.

We have already seen examples of smart output iterators that apply a function or filter on a predicate.

Now let’s see an example of smart output iterator that breaks down pairs and tuples, so that all the first elements go to one direction, all the second elements to another direction, and so on.

Two motivating cases: separating key from values, and transposing a collection a tuples

Let’s see two motivating examples for breaking down collections of pairs and tuples into specific containers.

Pairs

A std::map is a sorted collection of std::pairs, whose firsts are keys and seconds are values. We want to send the keys and the values of the map to two distinct containers. And to leverage on the power of smart output iterators, let’s say that we also want to apply a function only on values.

To illustrate, let’s create a map that associates strings to numbers:

std::map<int, std::string> entries = { {1, "one"}, {2, "two"}, {3, "three"}, {4, "four"}, {5, "five"} };

We would like to:

send the keys to keys,
send the values in upper case to values

with keys and values starting as empty containers:

std::vector<int> keys;
std::vector<std::string> values;

For this we need to implement the unzip output iterator. We will also use the transform iterator (formerly called output_transformer) to apply a function to the output of the unzip iterator:

auto const toUpper = fluent::output::transform(toUpperString);

std::copy(begin(entries), end(entries),
          unzip(back_inserter(keys),
                toUpper(back_inserter(values))));

toUpperString is a function that takes a std::string and returns a std::string that is the former one in upper case. It can be implemented like this:

std::string toUpperString(std::string const& s)
{
    std::string upperString;
    std::transform(begin(s), end(s), std::back_inserter(upperString), [](char c){ return std::toupper(c); });
    return upperString;
}

And we would like keys to contain {1, 2, 3, 4, 5}, and values to contain {"ONE", "TWO", "THREE", "FOUR", "FIVE"}.

Tuples

A more generic use case would use tuples instead of pairs. Here is a collection of tuples:

std::vector<std::tuple<int, int, int>> lines = { {1, 2, 3}, {4, 5, 6}, {7, 8, 9}, {10, 11, 12} };

In our example, this collection represents the lines of a table: the first line is 1 2 3, the second line is 4 5 6, and so on.

unzip

Let’s extract the columns of the table. To do this, we need to extract the first elements of each line and put them into a column1 container, then the second elements of each line and put them into a column2 container, and so on.

So our target code will be:

std::vector<int> column1, column2, column3;
    
std::copy(begin(lines), end(lines),
          unzip(back_inserter(column1),
                back_inserter(column2),
                back_inserter(column3)));

And we expect column1 to hold {1, 4, 7, 10}, column2 to hold {2, 5, 8, 11}, and column3 to hold {3, 6, 9, 12}.

Now that we have those two use cases for it, let’s implement the unzip output iterator.

The `unzip` output iterator

unzip will follow the typical implementation of smart output iterators:

the constructor keeps track of the underlying iterators to send data to,
operator* returns the object itself, so that…
…operator= is called by the user (e.g. STL algorithm) and can perform the action of sending data off to the underlying iterators,
operator++ forwards the increment to the underlying iterators.

So let’s start with the constructor:

template<typename... Iterators>
class output_unzip_iterator
{
public:   
    explicit output_unzip_iterator(Iterators... iterators) : iterators_(std::make_tuple(iterators...)) {}

private:
    std::tuple<Iterators...> iterators_;
};

We keep all the underlying iterators into a tuple. Indeed, there could be any number of underlying iterators.

The operator* does its job of allowing our smart output iterator to stay in the game when dereferenced:

output_unzip_iterator& operator*(){ return *this; }

The action then happens in operator=, when the STL algorithms assigns to what is returned by dereferencing the iterator (so here, the iterator itself). Let’s start with the simpler case of sending an std::pair to our iterator:

template<typename First, typename Second>
output_unzip_iterator& operator=(std::pair<First, Second> const& values)
{
    *std::get<0>(iterators_) = values.first;
    *std::get<1>(iterators_) = values.second;
    return *this;
}

We forward the first (resp. second) of the incoming pair to the first (resp. second) underlying iterator.

The overload of operator= that receives a std::tuple is less straightforward to implement. Its prototype looks like this:

template<typename... Ts>
output_unzip_iterator& operator=(std::tuple<Ts...> const& values)
{

And in this function, we need to send each element of the incoming tuple to its corresponding element in our tuple of underlying iterators.

One way of formulating this is to apply to each pair of respective elements of those tuples a function that takes a value and an iterator, and that sends that value to that iterator.

So the problem comes down to applying a function taking two parameters to respective elements coming from two tuples.

Applying a function to the elements of two tuples

Note: We’re going to delve into template metaprogramming and variadic templates here. I’m not an expert, and if you know how to improve what follows, I’m happy to hear your feedback!

To apply a function to the elements of one tuple, C++17 offers std::apply. But before C++17, there was a way to emulate std::apply. We are going to look into this implementation, and adapt it for elements coming from two tuples.

To apply a function to the elements of a tuple, we can 1) unwrap the tuple into a variadic pack and 2) pass the contents of the variadic pack as arguments to a function.

Unwrapping the tuple into a variadic pack

To do this, we use C++14 index_sequence:

template <class F, class Tuple1, class Tuple2>
constexpr decltype(auto) apply2(F&& f, Tuple1&& t1, Tuple2&& t2)
{
    return apply2_impl(std::forward<F>(f), std::forward<Tuple1>(t1), std::forward<Tuple2>(t2),
                       std::make_index_sequence<std::tuple_size<std::remove_reference_t<Tuple1>>::value>{});
}

Passing the contents of a variadic pack as arguments to a function

apply2_impl is a function that unwraps the contents of the tuples and pass them as parameters to f:

template <class F, class Tuple1, class Tuple2, std::size_t... I>
F apply2_impl(F&& f, Tuple1&& t1, Tuple2&& t2, std::index_sequence<I...>)
{
    return (void)std::initializer_list<int>{(std::forward<F>(f)(std::get<I>(std::forward<Tuple1>(t1)), std::get<I>(std::forward<Tuple2>(t2))),0)...}, f;
}

I reckon it is Sean Parent who came up with the technique of passing the contents of a variadic pack as arguments to a function without C++17. The above adapts that technique to a function that takes two parameters.

If you’re not familiar with variadic templates, I realize the above code must look not very different from this:

unzip

And it’s OK. You don’t need to understand those details to get the general meaning of the unzip iterator, and to use it. However, this manipulation of compile time collections is an interesting topic, and we’ll dive into it in a later post with more explanations.

Anyway, the body of operator= for our unzip iterator is now:

output_unzip_iterator& operator=(std::tuple<Ts...> const& values)
{
    apply2([](auto&& value, auto&& iterator){ *iterator = value; }, values, iterators_);
    return *this;
}

One last thing to implement is the increment operator: operator++. Here we make it forward the increment to its underlying iterators. So we need to apply a function that calls ++ on each element of the tuple. We could use std::apply in C++17, and in C++14 we can resort to an implementation with the technique we saw before:

template <class F, class Tuple, std::size_t... I>
F apply_impl(F&& f, Tuple&& t, std::index_sequence<I...>)
{
    return (void)std::initializer_list<int>{(std::forward<F>(f)(std::get<I>(std::forward<Tuple>(t))),0)...}, f;
}

template <class F, class Tuple>
constexpr decltype(auto) apply(F&& f, Tuple&& t)
{
    return apply_impl(std::forward<F>(f), std::forward<Tuple>(t),
                      std::make_index_sequence<std::tuple_size<std::remove_reference_t<Tuple>>::value>{});
}

And we use it this way:

output_unzip_iterator& operator++()
{
    detail::apply([](auto&& iterator){ ++iterator; }, iterators_);
    return *this;
}

output_unzip_iterator& operator++(int){ ++*this; return *this; }

Finally let’s not forget the aliases for iterators:

using iterator_category = std::output_iterator_tag;
using value_type = void;
using difference_type = void;
using pointer = void;
using reference = void;

And the actual unzip function that instantiates the iterator:

template<typename... Iterators>
output_unzip_iterator<Iterators...> unzip(Iterators... iterators)
{
    return output_unzip_iterator<Iterators...>(iterators...);
}

And we’re good to go.

Unzipping pairs and tuples

Let’s now test our new iterator!

Our first use case was breaking down a collection of pairs into a collection of keys and a collection of values, and apply a function on values:

std::map<int, std::string> entries = { {1, "one"}, {2, "two"}, {3, "three"}, {4, "four"}, {5, "five"} };

std::vector<int> keys;
std::vector<std::string> values;

auto const toUpper = fluent::output::transform(toUpperString);

std::copy(begin(entries), end(entries),
          unzip(back_inserter(keys),
                toUpper(back_inserter(values))));

When we output the contents of keys we now get:

1 2 3 4 5

And when we output the contents of values we get:

ONE TWO THREE FOUR FIVE

And our second case was using tuples, to breaks a collection of lines into a collection of columns:

std::vector<std::tuple<int, int, int>> lines = { {1, 2, 3}, {4, 5, 6}, {7, 8, 9}, {10, 11, 12} };
std::vector<int> column1, column2, column3;

std::copy(begin(lines), end(lines),
          unzip(back_inserter(column1),
                back_inserter(column2),
                back_inserter(column3)));

When we output the contents of column1 we get:

1 4 7 10

The outputs of column2 give:

2 5 8 11

And those of column3 are:

3 6 9 12

If you want to have a closer look at the code, you can check out the smart output iterators library, the implementation of the unzip iterator, and the tests associated to it.