Partitioning Data with Output Iterators in C++
A couple of months (or years?) back, we saw that partitioning in the STL meant tidying up data according to a predicate: all that satisfy the predicate in one group, and all that don’t satisfy the predicate in another group:
This is what the STL algorithms std::partition
(or std::stable_partition
to keep the relative order of elements) do:
auto numbers = std::vector<int>{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}; std::stable_partition(begin(numbers), end(numbers), [](int n){ return n % 2 == 0; }); for (auto const& number : numbers) std::cout << number << ' ';
The above program outputs:
2 4 6 8 10 1 3 5 7 9
All the elements satisfying the predicate are first, the others after.
But there is another way to perform a partition with the STL: putting the values in separate collections. One collection for the elements that satisfy the predicate, and another one for the elements that don’t:
auto const numbers = std::vector<int>{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}; auto evenNumbers = std::vector<int>{}; auto oddNumbers = std::vector<int>{}; std::partition_copy(begin(numbers), end(numbers), back_inserter(evenNumbers), back_inserter(oddNumbers), [](int n){ return n % 2 == 0; }); std::cout << "Even numbers:\n"; for (auto const& number : evenNumbers) std::cout << number << ' '; std::cout << "\nOdd numbers:\n"; for (auto const& number : oddNumbers) std::cout << number << ' ';
Note that numbers
is now const
, since the operation is no longer in place. The outputs are in evenNumbers
and oddNumbers
and the above code outputs:
Even numbers: 2 4 6 8 10 Odd numbers: 1 3 5 7 9
Let’s now move that logic out of the algorithm and into the output iterator.
Why a smart output iterator
Before getting into the implementation of an output iterator that performs the equivalent of std::partition_copy
, why would we want to do such a thing in the first place?
For two reasons:
- breaking off the flow of operations on a collection into two branches,
- chaining additional operations in either or both of those two branches.
To my knowledge, we can’t do this with C++ standard components, including with ranges that are coming up in C++20.
Indeed, ranges allow to chain operations, as long as they follow a linear flow:
numbers | ranges::view::transform(f) | ranges::view::filter(p);
Or they can apply operations that make the data converge, that is to say if several sources of data contribute to one result:
ranges::view::set_difference(numbers, otherNumbers) | ranges::view::transform(f);
But ranges can’t make the data flow diverge, or break off into several directions. This is a key difference between ranges and smart output iterators. They can complete each other, like we’ll see in a future post.
We’ve already seen some smart output iterators, such as transform
and filter
:
auto const times2 = transform([](int i) { return i*2; }); std::copy(begin(numbers), end(numbers), times2(back_inserter(results));
Or, as we’ll see in a future post, we can have a nicer syntax:
ranges::copy(numbers, transform([](int n){return n*2;}) >>= back_inserter(results));
Or something even nicer by hiding the call to copy
.
If you hadn’t heard of smart output iterators before, you may want to check out this introductory post on smart output iterators, or check out the library on Github.
The partition
iterator
Now that we’ve seen the rationale for implementing a partition
output iterator, let’s decide what we’d like its usage to look like (proceeding this way makes code more expressive):
auto const isEvenPartition = partition([](int n){ return n % 2 == 0; }); std::copy(begin(input), end(input), isEvenPartition(back_inserter(evenNumbers), back_inserter(oddNumbers)));
To do this, we’ll follow our model for implementing smart output iterators, inspired from one of the most basic smart output iterators out there, the standard back_inserter
.
We start by implementing operator*
, that does nothing but returning itself, in order to keep control on the operator=
that the STL algorithm will typically call afterwards:
output_partition_iterator& operator*(){ return *this; }
Same thing for operator++
, not much to do:
output_partition_iterator& operator++(){ return *this; } output_partition_iterator& operator++(int){ ++*this; return *this; }
The logic happens in operator=
. operator=
receives a value, and needs to send it to either one of the underlying iterators, according to whether or not it satisfies the predicate.
What follows of the previous sentence is that the iterator has to have access to both its underlying iterators, and to the predicate. We can store them as members in the class, and initialize them in the constructor. The concerned part of the class definition would then look like this:
output_partition_iterator(IteratorTrue iteratorTrue, IteratorFalse iteratorFalse, Predicate predicate) : iteratorTrue_(iteratorTrue) , iteratorFalse_(iteratorFalse) , predicate_(predicate) {} private: IteratorTrue iteratorTrue_; IteratorFalse iteratorFalse_; Predicate predicate_;
Finally, we can implement the operator=
:
output_partition_iterator& operator=(T const& value) { if (predicate_(value)) { *iteratorTrue_ = value; ++iteratorTrue_; } else { *iteratorFalse_ = value; ++iteratorFalse_; } return *this; }
Matching the desired usage
Remember the desired usage: we wanted to construct the iterator in two phases. First, a function partition
, that constructed a intermediate object:
auto const isEvenPartition = partition([](int n){ return n % 2 == 0; });
Then we’d use this object to take the underlying iterators and create the smart iterator we designed above:
isEvenPartition(back_inserter(evenNumbers), back_inserter(oddNumbers))
We therefore need an intermediary type that takes the predicate in its constructor, and has an operator()
taking the two underlying iterators to send data to, and returning the output_parititon_iterator
that we designed.
Let’s call this type output_partitioner
:
template<typename Predicate> class output_partitioner { public: explicit output_partitioner(Predicate predicate) : predicate_(predicate) {} template<typename IteratorTrue, typename IteratorFalse> output_partition_iterator<IteratorTrue, IteratorFalse, Predicate> operator()(IteratorTrue iteratorTrue, IteratorFalse iteratorFalse) const { return output_partition_iterator<IteratorTrue, IteratorFalse, Predicate>(iteratorTrue, iteratorFalse, predicate_); } private: Predicate predicate_; };
The partition
function now just builds an output_partitioner
(in C++17 with template type deduction in constructors, partition
could have been the object we called output_partitioner
):
template<typename Predicate> output_partitioner<Predicate> partition(Predicate predicate) { return output_partitioner<Predicate>(predicate); }
Et voilà le travail!
The whole code is up on Github.
Now we can use partition
to route the output of an algorithm into two branches, and combine this with other output iterators:
auto const isEvenPartition = partition([](int n){ return n % 2 == 0; }); auto const times2 = transform([](int n) { return n*2; }); auto const moreThan3 = filter([](int n) { return n>3; }); ranges::set_difference(input1, input2, isEvenPartition(times2(back_inserter(output1)), moreThan3(back_inserter(output2)));
This code expresses a lot in a few lines, compared to what the version with STL algorithms or for loops would have looked like.
More than two outputs
Our partition
iterator can split data into two branches according to a predicate. But what if we’d like to split into more than two? What would the interface look like? And the implementation?
This is what we explore in a future post, with the demultiplexer output iterator. But before that we’ll need some pre-requisite, including being able to apply STL-like algorithms on std::tuple
.
Also, I don’t find the name “Smart output iterator” very catchy. Can you think of a better name for the library? Ouputors, maybe? Or another name? Please leave a comment with your suggestion!
You will also like
- Unzipping a Collection of Tuples with the “unzip” Smart Output Iterator
- How to Use the STL With Legacy Output Collections
- Smart Output Iterators: A Symmetrical Approach to Range Adaptors
- Partitioning with the STL
- The World Map of C++ STL Algorithms
Share this post!