Custom comparison, equality and equivalence with the STL
Let’s start with the following code excerpt:
std::vector< std::pair<int, std::string> > v1 = ... // v1 is filled with data std::vector< std::pair<int, std::string> > v2 = ... // v2 is filled with data std::vector< std::pair<int, std::string> > results; std::sort(v1.begin(), v1.end()); std::sort(v2.begin(), v2.end()); std::set_difference(v1.begin(), v1.end(), v2.begin(), v2.end(), std::back_inserter(result), compareFirst);
There are 2 sets of data represented by 2 sorted vectors v1 and v2, on which we apply a std::set_difference
(see Algorithms on sets). This std::set_difference
writes its output in results
, with std::back_inserter
ensuring that all outputs are push_back’d into results.
One particularity though: a custom comparison operator is provided to std::set_difference
: compareFirst
.
By default, std::set_difference
compares the elements with the default comparison on std::pair
(which compares both the first and second element of the pair), and here with compareFirst
we want to compare pairs on their first element only. compareFirst
is not in the STL, so we will try to implement it ourselves.
Before jumping on to the implementation, we already have an interesting take away here. Even if std::set_difference
expect its input to be sorted, it is possible to use it (or any algorithm on sorted elements) based on a comparator (let’s call it C) different from the comparator used for sort, provided that the elements are also sorted by this comparator C. In our case for instance, we use a std::set_difference
that compares pairs by their first elements, although these pairs have been sorted by both their first and second elements. But since this implies that they are a fortiori sorted by first, it is completely OK to do this.
Now let’s implement compareFirst
. A natural, naïve, first-try code would look like:
bool compareFirst(const std::pair<int, std::string>& p1, const std::pair<int, std::string>& p2) { return p1.first == p2.first; // not final code, bug lurking here! }
Actually, this implementation won’t give the expected results at all. But why?? After all, set_difference should check whether a given element is equal to another one in the other collection, right?
The least we can say is that this seems completely unnatural, and the remainder of this post will consist in understanding how we came to this, and why this is in fact completely normal.
To understand this, we must view the STL as roughly divided into 2 parts: the part that operates on SORTED elements, and the part that operates on elements that are NOT SORTED.
The SORTED part of the STL
In this part are associative containters (std::map
, std::multimap
, std::set
, std::multiset
), because their elements are sorted.
Some algorithms also fall in this category, because they assume the elements they operate on are sorted: std::set_difference
, std::includes
or std::binary_search
for example.
The UNSORTED part of the STL
In this part there are sequence containers (std::vector
, std::list
, std::deque
and std::string
), because their elements are not necessarily sorted.
And the algorithms that fall into this category are those that don’t need their elements to be sorted, like std::equal
, std::count
or std::find
for example.
Comparing elements
There are two ways to express “a is the same as b” in C++:
- the natural way:
a == b
. This is called equality. Equality is based on operator==.
- the other way: a is not smaller than b and b is not smaller than a, so
!(a<b) && !(b<a)
. This is called equivalence. Equivalence is based on operator<.
Then two questions naturally arise about equivalence.
How is it different from equality?
For simple types like int
, and actually for most types in practice, equivalence is indeed the same thing as equality. But as pointed by Scott Meyers in Effective STL Item 19, there are some not-too-exotic types where the two are not the same, like case-insensitive strings for example.
Why such a far-fetched way to express a simple thing?
When an algorithm compares elements in a collection, it is easy to understand that there must only be one way of comparing them (having several comparators is cumbersome and creates a risk of inconsistency). So a choice needs to be made between comparing based on operator==
or on operator<
.
In the SORTED part of the STL, the choice is already made: by definition of sorting, the elements must be comparable with operator< (or a customized (operator<)-like function). The UNSORTED part on the other side does not have this constraint and can use the natural operator==.
Implementing the comparator
The UNSORTED part of the STL uses operator== to perform comparisons, while the SORTED part uses operator<. And custom comparison operators must follow this logic.
Now we understand how to implement our custom operator compareFirst
for std::set_difference
, which operates on sorted elements:
bool compareFirst(const std::pair<int, std::string>& p1, const std::pair<int, std::string>& p2) { return p1.first < p2.first; // correct, STL-compatible code. }
All of this is crucial to understand in order to use the STL efficiently.
Don't want to miss out ? Follow:   Share this post!