Implementing set_match in One Line of Code
In the previous post we’ve implemented set_match
, an algorithm on sets inspired from the STL ones, that pairs up matching elements between two sorted collections.
Being an algorithm on sets, the code we wrote for it looks like a typical implementation of an algorithm on set:
template<typename Set1, typename Set2, typename OutputIterator, typename Comparator> OutputIterator set_match(Set1&& set1, Set2&& set2, OutputIterator out, Comparator comp) { auto it1 = begin(set1); auto it2 = begin(set2); while (it1 != end(set1) && it2 != end(set2)) { if (comp(*it1, *it2)) { ++it1; } else if (comp(*it2, *it1)) { ++it2; } else { *out = std::make_pair(*it1, *it2); ++out; ++it1; ++it2; } } return out; }
But since it’s a typical implementation can we rather reuse the code of existing algorithms on sets to implement set_match
? Is there a generic code that algorithms on sets can be implemented with?
By reusing other algorithms and libraries, we’re going to implement set_match
in one line of code.
This post is part of our growing series on sets:
- How to check if 2 sorted collections have a common element
set_match
: Matching up Elements Between Sorted Collections- Implementing
set_match
in One Line of Code - STL algorithms on sets: one algorithm to implement them all
- Algorithms on set returning a boolean: exploring the algorithms
- Algorithms on set returning a boolean: implementing a generic algorithm
- Algorithms on set returning a boolean: a strong template interface
- NWH: Adapting algoritms on sets
Refresher on set_match
Here is a brief recap on set_match
. If you feel already fresh with the algorithm you can skip to the next section.
The goal of set_match
is to identify and pair up equivalent elements between two “sets”, that are sorted collections. For example, with those two maps:
std::map<int, char> input1 = {{1,'1'}, {2,'2'}, {3,'3'}, {5,'5'}, {7,'7'}, {8, '8'}}; std::map<int, std::string> input2 = {{2,"two"}, {3,"three"}, {4,"four"}, {5,"five"}, {7,"seven"}, {11,"eleven"}};
We can call set_match
this way:
auto results = std::vector<std::pair<std::pair<int, char>, std::pair<int, std::string>>>{}; set_match(input1, input2, back_inserter(results), NumberCharStringCompare{});
NumberCharStringCompare
is a function object that compares maps keys:
struct NumberCharStringCompare { bool operator()(std::pair<int const, char> const& numberWithChar, std::pair<int const, std::string> const& numberWithString) { return numberWithChar.first < numberWithString.first; } bool operator()(std::pair<int const, std::string> const& numberWithString, std::pair<int const, char> const& numberWithChar) { return numberWithString.first < numberWithChar.first; } };
Then the result of calling set_match
fills results
as if it was initialised this way:
std::vector<std::pair<std::pair<int, char>, std::pair<int, std::string>>> results = { { {2,'2'}, {2,"two"} }, { {3,'3'}, {3,"three"} }, { {5,'5'}, {5,"five"} }, { {7,'7'}, {7,"seven"} } };
For more details about set_match
and the logic behind its implementation, you can check out the detailed article on set_match
.
set_segregate
: a general algorithm on sets
A while back we built set_segregate
, a generalisation of the STL algorithms on sets.
The STL lets you compare sets by determining what elements they have in common and what elements they don’t. For example, std::set_difference
takes two sets A and B and produces the elements that are in A but not in B.
set_segregate
goes further, by giving you everything at the same time:
- the elements that are in A but not in B,
- the elements that in both in A and in B,
- and the elements that in B but not in A.
It has three output iterators:
template<class Set1, class Set2, class OutputOnly1, class OutputBoth, class OutputOnly2> void set_segregate(Set1&& set1, Set2&& set2, OutputOnly1 only1, OutputBoth both, OutputOnly2 only2);
For set_match
, we would be interested in the second output set, the elements that are both in A and in B.
We need them in the form of a pair, and set_segregate
is able to do that. set_segregate
detects the underlying type of the output iterator and, if this underlying type happens to be a pair containing the underlying type of set A and the underlying type of set B, it produces pairs as outputs. That’s what we need here.
If you’d like to read more about set_segregate
, you can check out the whole story of set_segregate
.
To be able to use set_segregate
to implement set_match
, we only need to discard the first and third outputs of set_segregate
.
One naive way to do this would be by filling containers that we don’t use:
template<typename Set1, typename Set2, typename OutputIterator, typename Comparator> OutputIterator set_match(Set1&& set1, Set2&& set2, OutputIterator out, Comparator comparator) { auto unused1 = std::vector<typename std::remove_reference_t<Set1>::value_type>{}; auto unused2 = std::vector<typename std::remove_reference_t<Set2>::value_type>{}; set_segregate(std::forward<Set1>(set1), std::forward<Set2>(set2), back_inserter(unused1), out, back_inserter(unused2), comparator); return out; }
But this is waste of execution time because it makes copies, a waste of memory to hold those copies, and a burden for code readability.
How can we write code that goes to the point, by just discarding the data we don’t need?
Breaking in the output iterator
set_segregate
, like STL algorithms, produce its results to output iterators. The STL provide various output iterators, such as back_inserter
that push_back
elements to a std::vector
, or begin
that overrides the contents of already filled collection.
But nothing prevents us from writing our own output iterators, and that’s what the pipes library does.
Here we’re going to use the dumbest of the smart output iterators: dev_null
, that ignores the value it receives.
The implementation of dev_null
is the following:
struct dev_null { using iterator_category = std::output_iterator_tag; using value_type = void; using difference_type = void; using pointer = void; using reference = void; dev_null& operator*(){ return *this; } dev_null& operator++(){ return *this; } template<typename T> dev_null& operator=(T&&){ return *this; } };
The 5 first aliases are necessary to define an iterator, and they are used by STL algorithms.
The algorithms of the STL, as well as set_segregate
, send data to their output iterator like this:
*out = value; ++out;
Or sometimes it’s shortened to this:
*out++ = value;
Although I find the first version more readable.
Either way, we can understand this syntax by imagining that out
is the begin
of a std::vector
. In that case:
*out
is a reference to the first element of the vector,*out = value
writes over this first element,++out
moves the iterator on to the next element.
dev_null
offers operators that are compatible with that syntax, but that do nothing. And to make operator=
also do nothing, operator*
returns a reference to dev_null
itself, so that *out = value
calls the operator=
of dev_null
, which does nothing.
Muting set_segregate
with dev_null
Now we can use dev_null
to discard the outputs of set_segregate
that we’re not interested in:
template<typename Set1, typename Set2, typename OutputIterator, typename Comparator> OutputIterator set_match(Set1&& set1, Set2&& set2, OutputIterator out, Comparator comparator) { set_segregate(std::forward<Set1>(set1), std::forward<Set2>(set2), dev_null{}, out, dev_null{}, comparator); return out; }
Even if the algorithm is passing data to dev_null
, there is no copy involved since dev_null
takes data by reference.
Now the implementation of set_match
is down to one meaningful line of code (not counting the line with return out
).
An algorithm to rule them all?
When you think about it, there is another algorithm that looks a lot like set_match
: it’s the standard algorithm std::set_intersection
. It does everything like set_match
except that, instead of returning pairs of matching elements, it returns the value coming from the first set.
The implementation of set_intersection
must be very close to the one of set_match
. Can we share some code between set_match
and set_intersection
? What about the other STL algorithms on sets?
It turns out we can implement a bunch of STL algorithms on sets with a common algorithm. This is what we see in the next post of our series on sets. Stay tuned!
You will also like
- STL algorithms on sets
- How std::set_difference is implemented (video)
set_aggregate
,set_segregate
: higher-level algorithms on setsset_match
: Matching up Elements Between Sorted Collections
Share this post!