Algorithms on Sets That Return a Boolean: Exploring the algorithms
In a previous article on sets we’ve designed share_element
, an algorithm on sets (sorted collections) that returns a boolean indicating whether they have a element in common, and that operates in linear time.
On the other hand, the STL also offers an algorithm on sets that return a boolean: std::includes
. std::includes
takes two sets and returns a boolean indicating whether the first one contains the elements of the second one. It also operates in linear time.
By looking at what share_element
and std::includes
have in common, we will uncover other interesting algorithms that compare sets together and return a boolean.
This post is part of the series on algorithms on sets:
- How to check if 2 sorted collections have a common element
set_match
: Matching up Elements Between Sorted Collections- Implementing
set_match
in One Line of Code - STL algorithms on sets: one algorithm to implement them all
- Algorithms on set returning a boolean: exploring the algorithms
- Algorithms on set returning a boolean: implementing a generic algorithm
- Algorithms on set returning a boolean: a strong template interface
- NWH: Adapting algoritms on sets
share_element
and std::includes
: a starting point
Let’s look at our implementation of share_element
:
template<class SetA, class SetB, typename Compare> bool share_element(SetA&& setA, SetB&& setB, Compare comp) { auto xA = setA.begin(); auto xB = setB.begin(); while (xA != setA.end() && xB != setB.end()) { if (comp(*xA, *xB)) { ++xA; } else if (comp(*xB, *xA)) { ++xB; } else { return true; } } return false; }
Now let’s look at an implementation of the std::includes
STL algorithm:
template <typename SetA, typename SetB, typename Compare> bool includes(SetA&& setA, SetB&& setB, Compare comp) { auto xA = setA.begin(); auto xB = setB.begin(); while (xA != setA.end() && xB != setB.end()) { if (comp(*xA, *xB)) { ++xA; } else if (comp(*xB, *xA)) { return false; } else { ++xA; ++xB; } } return xB == setB.end(); }
We can see that they have the same structure. They only differ at a few places, where they return different booleans.
If we generalize this structure, an algorithm on sets that returns a boolean has 4 customisation points:
template <typename SetA, typename SetB, typename Compare> bool includes(SetA&& setA, SetB&& setB, Compare comp) { auto xA = setA.begin(); auto xB = setB.begin(); while (xA != setA.end() && xB != setB.end()) { if (comp(*xA, *xB)) { 1st customisation point } else if (comp(*xB, *xA)) { 2nd customisation point } else { 3rd customisation point } } 4th customisation point }
On the first 3 customisation points, the algorithm can either return a boolean or move on by incrementing the iterators. On the 4th one, it has to return a boolean.
A combination of possibilities
Put another way, here is the list of possibilities for each customisation point:
- 1st customisation point:
return true
return false
- move on (
++xA
)
- 2nd customisation point:
return true
return false
- move on (
++xB
)
- 3rd customisation point:
return true
return false
- move on (
++xA; ++xB;
)
- 4th customisation point:
return true
return false
- the end of
setA
is reached (xA == setA.end()
) - the end of
setB
is reached (xB == setB.end()
) - the end of both is reached (
xA == setA.end() && xB == setB.end()
)
This makes a total of 3×3×3×5 = 135 possible algorithms!
std::includes
and share_element
are just two of them.
share_element
corresponds to this combination:
- 1st customisation point: move on
- 2nd customisation point: move on
- 3rd customisation point:
return true
- 4th customisation point:
return false
And std::includes
corresponds to this combination:
- 1st customisation point: move on
- 2nd customisation point:
return false
- 3rd customisation point: move on
- 4th customisation point: reached the end of
setB
All this brings an obvious question: What are the 133 other algorithms?
Exploring the combinations
133 is a large number of algorithms. But it turns out that we can prune off some of them because they mean something that is not useful or because they don’t mean anything at all.
What’s left after pruning off the combinations are a handful of algorithm nuggets!
Before getting to the nuggets, let’s see how some combinations are not worth retaining.
Combinations that mean something not interesting
Let’s see an example of an algorithm that means something, but that is not useful.
Take the following combination:
- 1st customisation point: move on,
- 2nd customisation point: move on,
- 3rd customisation point: move on
- 4th customisation point: reached the end of
setA
Its code looks like that:
template <typename SetA, typename SetB, typename Compare> bool myAlgorithm(SetA&& setA, SetB&& setB, Compare comp) { auto xA = setA.begin(); auto xB = setB.begin(); while (xA != setA.end() && xB != setB.end()) { if (comp(*xA, *xB)) { ++xA; } else if (comp(*xB, *xA)) { ++xB; } else { ++xA; ++xB; } } return xA == setA.end(); }
This algorithms traverses the two sets until it reaches the end of one of them. When it does, it returns a boolean indicating if it reached the end of setA
.
This means that this algorithm indicates whether the size of setA
is less or equal than the size of setB
. In general, this is something we can get in less than linear time. For example, if we’re using std::set
s, we can just call their .size()
methods and compare them.
So there is little point to the algorithm coming out of this particular combination.
Combinations that don’t mean anything
Now the we’ve seen an algorithm that means something useless, let’s see an example of a combination that results in an algorithm that doesn’t mean anything.
Or I should rather say, an algorithm where I didn’t see any meaning.
Consider the following combination:
- 1st customisation point: move on,
- 2nd customisation point:
return false
, - 3rd customisation point:
return true
, - 4th customisation point: reached the end of
setA
.
Let’s see the corresponding code:
template <typename SetA, typename SetB, typename Compare> bool myAlgorithm(SetA&& setA, SetB&& setB, Compare comp) { auto xA = setA.begin(); auto xB = setB.begin(); while (xA != setA.end() && xB != setB.end()) { if (comp(*xA, *xB)) { ++xA; } else if (comp(*xB, *xA)) { return false; } else { return true; } } return xA == setA.end(); }
This algorithms does something, and I don’t know about you but I can’t see any meaning in it.
Basically every algorithm that has a return true
and a return false
in the first three customisation points doesn’t have any meaning in my opinion. Indeed, you don’t know from the call site whether the algorithm has reached the end of any of the sets before returning.
That said, I examined each of the 135 combinations, and I could well have overlooked the meaning of some algorithms and discard them too quickly. If you see an algorithm with useful meaning that isn’t listed in the nuggets that follow, please share your discovery in a comment!
The nuggets
Here are 6 combinations that have meaning and are useful.
Determining whether the first set is a prefix of the second one
The useful combination:
- 1st customisation point:
return false
, - 2nd customisation point:
return false
, - 3rd customisation point: move on,
- 4th customisation point: reached the end of
setA
.
Explanation: The algorithm traverses the two sets in lockstep, until it reaches one element which is not in common between the two (it then returns false
), or the end of setA
(it returns true
).
We can call this algorithm is_prefix_of
.
Determining whether either set is a prefix of the other
The useful combination:
- 1st customisation point:
return false
, - 2nd customisation point:
return false
, - 3rd customisation point: move on,
- 4th customisation point:
return true
.
Explanation: The algorithm traverses the two sets in lockstep, until it reaches one element which is not in common between the two (it then returns false
), or the end of any of the two sets (it returns true
).
Note that we could achieve the same result by calling is_prefix_of
twice and swapping the arguments, but this would result in traversing the set twice.
We can call this new algorithm is_one_prefix_of_other
.
Determining whether two sets have the same elements
The useful combination:
- 1st customisation point:
return false
, - 2nd customisation point:
return false
, - 3rd customisation point: move on,
- 4th customisation point: reached the end of both.
Explanation: The algorithm traverses the two sets in lockstep, until it reaches one element which is not in common between the two (it then returns false
), or the end of both sets (it returns true
).
It is in the same spirit as std::equal
, but note that strictly speaking we can’t use std::equal
with sets, because std::equal
uses operator==
and sorted collections are only required to have operator<
. Read more about equality and equivalence here.
We can call this algorithm equivalent
.
Determining whether two sets have no element in common
The useful combination:
- 1st customisation point: move on,
- 2nd customisation point: move on,
- 3rd customisation point:
return false
, - 4th customisation point:
return true
.
Explanation: The algorithm traverses the two sets in lockstep, until it reaches one element which is in common between the two (it then returns false
), or the end of any set (it returns true
). Since the sets are sorted, the remaining part of the other set has elements that are greater than those examined, so not in common.
We can call this algorithm disjoint
.
Note that disjoint
is also the negation of share_element
.
Determining whether all the elements of the first set are smaller than the smallest of the second one
The useful combination:
- 1st customisation point: move on,
- 2nd customisation point:
return false
, - 3rd customisation point:
return false
, - 4th customisation point:
return true
.
Explanation: The algorithm traverses the two sets in lockstep, until it reaches one element which is in common between the two (it then returns false
), or an element of the second set that would be smaller than one of the first set (it also returns false
). If it reaches the end of any set and that didn’t happen, it returns true
.
We can call this algorithm is_before
.
Determining whether all the elements of the second set are smaller than the smallest of the first one
The useful combination:
- 1st customisation point:
return false
, - 2nd customisation point: move on,
- 3rd customisation point:
return false
, - 4th customisation point:
return true
.
Explanation: The algorithm traverses the two sets in lockstep, until it reaches one element which is in common between the two (it then returns false
), or an element of the first set that would be smaller than one of the second set (it also returns false
). If it reaches the end of any set and that didn’t happen, it returns true
.
We can call this algorithm is_after
.
Note that is_after
is not the negation of is_before
, because two sets with intertwined elements would return false
for both algorithms.
But is_after
is equivalent to swapping the elements of is_before
. However, it’s useful to offer the possibility to write both, the same way that we have operator<
and operator>
in C++, so that we can choose for each given call site which one is the most expressive.
In fact, is_after
is almost equivalent to swapping the elements of is_before
. But as we’ll see in a future post, there is a subtlety that prevents us from implementing it this way anyway.
A common algorithm to implement all that
In summary, we have 8 interesting algorithms on sets that return a boolean:
std::includes
share_element
is_prefix_of
is_one_prefix_of_other
equivalent
disjoint
is_before
is_after
Would it be possible to write a common algorithm that takes the combination of the 4 customisation points and returns a boolean?
This is what we see in a next blog post. Stay tuned!
You will also like
- The sets library
- STL algorithms on sets
- How std::set_difference is implemented (video)
set_aggregate
,set_segregate
: higher-level algorithms on sets- How to check if 2 sorted collections have a common element
set_match
: Matching up Elements Between Sorted Collections- Implementing
set_match
in One Line of Code - STL algorithms on sets: one algorithm to implement them all
Share this post!