Chaining Output Iterators Into a Pipeline
We’ve been over a various set of smart output iterators over the past few weeks. Today we explore how to combine them and create expressive code.
If you’re just joining our series on smart output iterators, you might want to check out this introductory post on smart output iterators.
So far, we’ve been combining smart output iterators by using operator()
:
auto const isEven = filter([](int n){ return n % 2 == 0; }); auto const times2 = transform([](int n){ return n * 2; }); std::vector<int> results; std::copy(begin(input), end(input), isEven(times2(times2(back_inserter(results)))));
The output iterators generated by filter
and times2
have an operator()
that accepts another iterator and sends results to it. That is to say that isEven
sends to times2
only the elements of input
that are even, and times2
sends on every number it gets in multiplied by 2 to another times2
, which doubles those results again and send them to back_inserter
, which sends them to the push_back
method of results
.
After executing this code, results
contains {8, 16, 24, 32, 40}
.
But combining output iterators this way by using operator()
has several drawbacks:
- it doesn’t reflect the fact that each one passes data on to the next one
- the more iterators there are, the more parentheses build up (and this is C++, not LISP!)
- it forces us to define the iterator outside of the statement they’re used in.
To illustrate this last drawback, consider what it would look like to define the output iterators where they’re used:
std::copy(begin(input), end(input), filter([](int n){ return n % 2 == 0; })(transform([](int n){ return n * 2; })(transform([](int n){ return n * 2; })(back_inserter(results)))));
Not really clear. This gets worse if the iterators belong to a namespace, which they should do if we use them in existing code:
std::copy(begin(input), end(input), output::filter([](int n){ return n % 2 == 0; })(output::transform([](int n){ return n * 2; })(output::transform([](int n){ return n * 2; })(back_inserter(results)))));
Even if we pile them up across several lines of code, the transitions between iterators are still unclear:
std::copy(begin(input), end(input), output::filter([](int n){ return n % 2 == 0; }) (output::transform([](int n){ return n * 2; }) (output::transform([](int n){ return n * 2; }) (back_inserter(results)))));
We could declare the lambdas on separate, but the syntax remains confusing:
auto isEven = [](int n){ return n % 2 == 0; }; auto times2 = [](int n){ return n * 2; }; std::copy(begin(input), end(input), output::filter(isEven)(output::transform(times2)(output::transform(times2)(back_inserter(results)))));
Compare this with the equivalent code using range-v3:
inputs | ranges::view::filter(isEven) | ranges::view::transform(times2) | ranges::view::transform(times2);
This looks much nicer.
Let’s start by trying to use an operator to combine output iterators and, in a future post, get rid of std::copy
and combine range adaptors and smart output iterators in the same expression.
operator|
and left-associativity
Could we just use operator|
to combine smart output iterators, like we do for combining ranges?
It turns out that we can’t, because operator|
is left-associative.
What does “left-associative” mean?
If we look back at the expression using ranges, it was (by omitting namespaces for brevity):
inputs | filter(isEven) | transform(times2) | transform(times2)
This expression is ambiguous. operator|
takes two parameters, and the three operator|
s need to be executed successively. So there are multiple ways to do that:
- calling
operator|
on the first two operands on the left, then callingoperator|
on the result of this operation and the third one, and so on. This is left-associative, and is equivalent to this:
(((inputs | filter(isEven)) | transform(times2)) | transform(times2))
- calling
operator|
on the last two operands on the left, then callingoperator|
on the result of this operation and the second one, and so on. This is right-associative, and is equivalent to this:
(inputs | (filter(isEven) | (transform(times2) | transform(times2))))
- calling the
operator|
in yet a different order, such as:
(inputs | filter(isEven)) | (transform(times2) | transform(times2))
The last example is neither left-associative nor right-associative.
Now that we’re clear on what left-associative means, let’s go back to operator|
: operator|
is left-associative. That is part of the C++ standard.
A right-associative operator
A left-associative operator makes sense for ranges, because ranges build up from left to right.
Indeed, inputs | filter(isEven)
is a range of filtered elements. When we apply a transformation on those elements, we tack on a transform(times2)
to this range of filtered elements. That’s why it makes sense to use a left-associative operator:
(((inputs | filter(isEven)) | transform(times2)) | transform(times2))
For output iterators, this is the opposite. If we use operator|
to combine them, like this (namespaces again omitted for brevity):
filter(isEven) | transform(times2) | transform(times2) | back_inserter(results);
Then the left-associativity of operator|
would dictate that the first operation to be executed in this expression would be:
filter(isEven) | transform(times2)
But contrary to input | filtered(isEven)
that represents a filtered range, filter(isEven) | transform(times2)
here with output iterators doesn’t represent anything. It doesn’t stand on its own.
What does represent something and stands on its own is the combination of the last two output iterators:
transform(times2) | back_inserter(results)
It represents an output iterator that applies times2
and send the result to the push_back
method of results
.
What we need then is a right-associative operator. What right-associative iterators are there in C++? Let’s look it up on cppreference.com, that provides this useful table:
As the latest column of this table indicates, the right-associative operators are on lines 3 and 16.
The operators on line 3 are unary (they only take one parameter), so we’re left with line 16. To me, the one that looks most natural for our purpose is operator>>=
. If you think otherwise please leave a comment to express your opinion.
By using operator>>=
, our combination of output iterators becomes:
filter(isEven) >>= transform(times2) >>= transform(times2) >>= back_inserter(results)
This leads to clearer code:
std::copy(begin(input), end(input), output::filter(isEven) >>= output::transform(times2) >>= output::transform(times2) >>= back_inserter(results));
We can also pile it up on several lines and/or use inline lambdas:
std::copy(begin(input), end(input), output::filter([](int n){ return n % 2 == 0; }) >>= output::transform([](int n){ return n * 2; }) >>= output::transform([](int n){ return n * 2; }) >>= back_inserter(results));
Which is kind of like in the ranges style.
The actual implementation
All we’ve seen so far is just the interface. And I think this is what matters the most. Now that we’ve got this straightened out, we can work on the implementation.
In our case the implementation is quite straightforward, as it consists in defining an operator>>=
that takes a helper that represents an output iterator (say output_transformer
which is what transform
returns, see the introductory post on smart output iterators or the actual code of transform to read more details about this) and any other output iterator and associate the two to create an output iterator:
template<typename TransformFunction, typename Iterator> output_transform_iterator<std::tuple<TransformFunction>, Iterator> operator>>=(output_transformer<TransformFunction> const& outputTransformer, Iterator iterator) { return outputTransformer(iterator); }
Towards more powerful features and a nicer syntax
What would be nicer is to get rid of the call to std::copy
, and just write the operations in the form of a pipeline. And what would be even nicer is to combine ranges and smart output iterators in the same expression, to benefit from their respective advantages and get the best of both worlds.
This is what we explore in the next post.
And if you see how to use operator|
to combine smart output iterators instead of operator>>=
, it would be great. Please leave a comment if you have an idea about how to do it.
You will also like
- Applying Several Transforms in One Pass on a Collection
- Smart Output Iterators: A Symmetrical Approach to Range Adaptors
- Unzipping a Collection of Tuples with the “unzip” Smart Output Iterator
- Introduction to the C++ Ranges Library
- How Smart Output Iterators Avoid the TPOIASI
Share this post!