STL Algorithms on Sets: One Algorithm to Implement Them All
The STL algorithms on sets are one of the most convenient things the C++ standard library offers. We’re going to see how they can all be implemented with the same core algorithm.
This article is part of our series on algorithms on sets, which now includes:
- How to check if 2 sorted collections have a common element
set_match
: Matching up Elements Between Sorted Collections- Implementing
set_match
in One Line of Code - STL algorithms on sets: one algorithm to implement them all
- Algorithms on set returning a boolean: exploring the algorithms
- Algorithms on set returning a boolean: implementing a generic algorithm
- Algorithms on set returning a boolean: a strong template interface
- NWH: Adapting algoritms on sets
Algorithms that look like each other
The STL offers 4 algorithms on sets that look like each other. They all take two sorted collections, A and B, and:
std::set_difference
outputs the elements that are in A and not in B,std::set_intersection
outputs the elements that are both in A and in B,std::union
output the elements that in A or in B,std::set_symmetric_difference
outputs the elements that are in A and not in B or in B and not in A (or said differently, in A xor in B).
They all benefit from the fact that A and B are sorted to operate in linear complexity (size of A + size of B). For more details about the algorithms on sets, check out this refresher first.
Even if they all do different things, they’re overall pretty similar. Couldn’t we write a core algorithm that they could all be implemented with?
That question has been in the back of my mind for a while. At one occurrence of Meeting C++ I had the chance to meet Sean Parent and to discuss this with him. Sean suggested that this could be done by associating a logical predicate to each algorithm: set_insersection
is AND, set_union
is OR, and so on.
Let’s write code to do that.
set_logical_operation
Let’s call our common algorithm set_logical_operation
.
set_logical_operation
takes two input collections and an output iterator. On top of that, set_logical_operation
takes a logical predicate: a function that takes two bool
s and returns a bool
.
Let’s write the expected call site first, as this generally allows to write simple code:
// equivalent to std::set_intersection set_logical_operation(A, B, std::back_inserter(results), std::logical_and<int>{}); // equivalent to std::set_union set_logical_operation(A, B, std::back_inserter(results), std::logical_or<int>{}); // equivalent to std::set_symmetric_difference (predicate is XOR) set_logical_operation(A, B, std::back_inserter(results), [](bool inLeft, bool inRight){ return inLeft ^ inRight;}); // equivalent to std::set_difference set_logical_operation(A, B, std::back_inserter(results), [](bool inLeft, bool inRight){ return inLeft && !inRight;});
Now that we’re clear on what its interface should look like, let’s move on to implementing set_logical_operation
.
Implementing set_logical_operation
Here is the prototype of set_logical_operation
:
template<typename SetA, typename SetB, typename OutputIterator, typename LogicalOperation> OutputIterator set_logical_operation(SetA&& setA, SetB&& setB, OutputIterator&& out, LogicalOperation logicalOperation) {
With the predicate passed to set_logical_operation
, we can determine three things:
- should we keep the elements that are in A and not in B?
- should we keep the elements that are both in A and in B?
- should we keep the elements that are in B and not in A?
To do this, we can invoke the predicate with the following respective calls:
logicalOperation(true, false)
logicalOperation(true, true)
logicalOperation(false, true)
Depending on those values, we want various parts of the outputs of set_segregate
. set_segregate
is a non-standard algorithm on sets that takes two sorted collections A and B, and three output iterators to which it sends respectively:
- the elements that are in A and not in B,
- the elements that are both in A and in B,
- the elements that are in B and not in A.
Its prototype is:
template<class SetA, class SetB, class OutputOnlyA, class OutputBoth, class OutputOnlyB> void set_segregate(Set1&& setA, Set2&& setB, OutputItLeft&& onlyA, OutputItBoth&& both, OutputItRight&& onlyB);
We can implement set_logical_operation
by calling set_segregate
.
Discarding outputs
The challenging aspect of doing that is to ignore the outputs of set_segregate
that we’re not interested in.
To do that we can use the dev_null
.
The dev_null
is an non-standard output iterator available in the pipes library that ignores the value it receives. Its implementation is this:
struct dev_null { using iterator_category = std::output_iterator_tag; using value_type = void; using difference_type = void; using pointer = void; using reference = void; dev_null& operator*(){ return *this; } dev_null& operator++(){ return *this; } template<typename T> dev_null& operator=(T&&){ return *this; } };
So we need to pass out
to the outputs of set_segregate
that we want to keep, and dev_null
to those we want to discard.
A simple way to do this is to go over all the possibilities for the values of the logical operation:
template<typename SetA, typename SetB, typename OutputIterator, typename LogicalOperation> OutputIterator set_logical_operation(SetA&& setA, SetB&& setB, OutputIterator&& out, LogicalOperation logicalOperation) { auto const includeElementsInAOnly = logicalOperation(true, false); auto const includeElementsInBOnly = logicalOperation(false, true); auto const includeElementsInBoth = logicalOperation(true, true); if (includeElementsInAOnly && includeElementsInBoth && includeElementsInBOnly) { set_segregate(setA, setB, out, out, out); } else if (includeElementsInAOnly && includeElementsInBoth && !includeElementsInBOnly) { set_segregate(setA, setB, out, out, dev_null{}); } else if (includeElementsInAOnly && !includeElementsInBoth && includeElementsInBOnly) { set_segregate(setA, setB, out, dev_null{}, out); } else if (includeElementsInAOnly && !includeElementsInBoth && !includeElementsInBOnly) { set_segregate(setA, setB, out, dev_null{}, dev_null{}); } else if (!includeElementsInAOnly && includeElementsInBoth && includeElementsInBOnly) { set_segregate(setA, setB, dev_null{}, out, out); } else if (!includeElementsInAOnly && includeElementsInBoth && !includeElementsInBOnly) { set_segregate(setA, setB, dev_null{}, out, dev_null{}); } else if (!includeElementsInAOnly && !includeElementsInBoth && includeElementsInBOnly) { set_segregate(setA, setB, dev_null{}, dev_null{}, out); } else if (!includeElementsInAOnly && !includeElementsInBoth && !includeElementsInBOnly) { set_segregate(setA, setB, dev_null{}, dev_null{}, dev_null{}); } return out; }
This implementation does the job. However, it seems that we’re repeating a lot of code, and that we could refactor that into more straightforward code.
Simplifying the code with if constexpr
What makes the code challenging is that out
and dev_null
are of two different types. So we can’t write code like:
if (includeElementsInAOnly) { outputIterator = out; } else { outputIterator = dev_null{}; }
But by using C++17’s if constexpr
, we can write a function that returns the correct type to use. That function won’t always have the same type, but this is one of the things that if constexpr
allows:
template<bool shouldMakeOutputIterator, typename OutputIterator> decltype(auto) makeOutputIteratorOrDevnull(OutputIterator&& out) { if constexpr (shouldMakeOutputIterator) { return std::forward<OutputIterator>(out); } else { return dev_null{}; } }
Depending on the boolean template parameter, this function will either return the output iterator it takes as a parameter, or a dev_null
.
If you’re not familiar with if constexpr
and the other good things that C++17 provides, get up to speed with Bartek’s book C++17 in detail.
Note that FWD
is a non-standard macro to shorten the call to std::forward
(thanks Vittorio Romeo):
#define FWD(value) std::forward<decltype(value)>(value)
We can now use our function to implement set_logical_operation
:
template<typename SetA, typename SetB, typename OutputIterator, typename LogicalOperation> OutputIterator set_logical_operation(SetA&& setA, SetB&& setB, OutputIterator out, LogicalOperation logicalOperation) { auto constexpr includeElementsInAOnly = logicalOperation(true, false); auto constexpr includeElementsInBOnly = logicalOperation(false, true); auto constexpr includeElementsInBoth = logicalOperation(true, true); auto outputAOnly = makeOutputIteratorOrDevnull<includeElementsInAOnly>(FWD(out)); auto outputBOnly = makeOutputIteratorOrDevnull<includeElementsInBOnly>(FWD(out)); auto outputBoth = makeOutputIteratorOrDevnull<includeElementsInBoth>(FWD(out)); set_segregate(setA, setB, outputAOnly, outputBoth, outputBOnly); return out; }
However, this code end s up calling the constructor of the output iterator up to three times, to construct outputAOnly
, outputBoth
and outputBOnly
.
It will be a move constructor if there is one. But if the output iterator has no move constructor (and Effective Modern C++ recommends in item 29 that we don’t count on move operations in generic code), then they will make copies. If the iterators are begin
or back_inserter
that’s not too bad, but if they are pipes with large data as context, that may be not desirable.
We can avoid all this by passing the results of the function directly to set_seggregate
:
template<typename SetA, typename SetB, typename OutputIterator, typename LogicalOperation> OutputIterator set_logical_operation(SetA&& setA, SetB&& setB, OutputIterator&& out, LogicalOperation logicalOperation) { auto constexpr includeElementsInAOnly = logicalOperation(true, false); auto constexpr includeElementsInBOnly = logicalOperation(false, true); auto constexpr includeElementsInBoth = logicalOperation(true, true); set_segregate(setA, setB, makeOutputIteratorOrDevnull<includeElementsInAOnly>(std::forward<OutputIterator>(out)), makeOutputIteratorOrDevnull<includeElementsInBoth>(std::forward<OutputIterator>(out)), makeOutputIteratorOrDevnull<includeElementsInBOnly>(std::forward<OutputIterator>(out))); return out; }
One algorithm to rule them all?
With set_logical_operation
, we now have a core algorithm that allows to implement the following STL algorithms:
std::set_difference
,std::set_symmetric_difference
,std::set_intersection
,std::set_union
.
But there is another algorithm on sets that the STL offers: std::includes
. std::includes
takes two sets A and B and returns a boolean, indicating whether all the elements of B are also in A.
Our new set_logical_operation
doesn’t allow to implement std::includes
. std::includes
belongs to another family of algorithms on sets: the algorithms that compare two sets and return a boolean.
This family of algorithms is what we tackle next in our series on algorithms on sets. Stay tuned!
Don't want to miss out ? Follow:   Share this post!