Indexing Data Structures with C++ Scoped Enums
Today’s guest post is written by Fernando J. Iglesias Garcia. Fernando is half software engineer, half junior researcher, interested in modern C++ and graph signal processing. Fernando can be reached online at @feriglegarc on Twitter and GitHub.
Interested in writing for Fluent C++ too? Submit your guest post!
Recently, a couple of colleagues and I participated in the Benelux Algorithm Programming Contest (BAPC). It was a great day in the beautiful town-university Louvain-la-Neuve.
One of the problems (H), boiled down to Dijkstra’s algorithm with a twist: each graph node is associated with one of two states. This state controls some aspects of the search such as the objective computation, together with which and when new nodes are included in the ongoing exploration.
For some time I have been hearing about the benefits of using enum classes rather than good-old plain enums and I was itching to try them out. In an enum class, the enumeration values are scoped, whereas with good-old plain enums there’s no direct scoping, and name clashes can quickly become an issue. So, employing an enum class to represent the state of the nodes sounded like fun!
Indexing an array with a scoped enum
Unfortunately and against my excitement, I quickly noticed that it was not possible to use values of an enum class directly as indices:
enum class binary : bool { first = 0, second = 1 }; std::array<int, 2> arr; // arr[binary::first] = 1; // Compilation error: no match for 'operator[]' // (operand types are 'std::array<int, 2>' and 'binary')
After a quick poke that ending up at (wait for it…) Stack Overflow, I got used to the idea that enum class values are not meant to be used directly as indices. Static-casting is an option, so one could quickly make an utility such as:
enum class binary : bool { first = 0, second = 1 }; template<size_t size> constexpr int at(std::array<int, size> const& arr, binary idx) { return arr[static_cast<size_t>(idx)]; }
The point of encapsulating the cast in the function is to constrain the users of this interface to pass in the scoped enum binary
. If they were to call the static_cast
directly, they could inadvertently pass an int
, killing the interest of the scoped enum.
Still, I am not 100% happy resorting to the cast as I find it does not reflect quite a proper design.
Indexing a hash map with a scoped enum
So then I thought, well, what about just using a hash table (aka unordered_map) whose key type is the enum class. That should definitely work, but what intrigued me the most in this respect was, what would be the overhead of going from array direct-access to hashing in a unordered map?
A quick benchmark focusing exclusively on the access to the data structures shows that, as expected, the lighter direct-access to arrays gives about 2x as fast results:
Benchmark source code run in quick-bench.
But what about a more realistic application, where obviously in addition to accessing the data we want to do something with it? To this end, I found the actual contest test cases to make good test vectors. You can grab the data from here. For problem H, there are close to 50 input vectors, ranging from small graphs covering corner cases to large graphs with hundreds of thousands of vertices and edges.
I compared two versions of my implementation of the algorithm, one using arrays and casting as shown first, and another based on hash-tables. I aggregated the time taken by each implementation to solve all the test cases (to reduce random timing variations).
Repeating this procedure a few times, I found both versions are essentially equivalent in terms of performance. They both take on average 46 seconds to solve all test cases (on a i5-6300U CPU @ 2.40GHz in a T470 Lenovo laptop). Note that, as shown in the benchmark results above, this does not mean that both indexing methods have equivalent runtime.
As expected, direct array access is lighter and thus faster than relying on hash-tables. The point is that in an actual real-world application (like this Dijkstra’s algorithm puzzle) the cost of doing “real work”™ may overshadow the cost of simpler operations such as indexing. In this case, the bottleneck is in the operator< of the binary search tree node, which is called multiple times each time the tree is modified or queried via find.
Problem solutions: arrays and casting, unordered_map.
What are your thoughts on this topic?
Do you have a good argument to why enum classes cannot be directly used as indices?
Feel free to comment and share the discussion.
You will also like
- Overview of std::map’s Insertion / Emplacement Methods in C++17
- How to Flatten Out a Nested Switch Statement
- When to Use Enums and When to Use Tag Dispatching in C++
Share this post!