std::exchange Patterns: Fast, Safe, Expressive, and Probably Underused
This is a guest post from Ben Deane. Ben is a lifelong learner and algorithms fan who works in the finance industry and loves to write expressive, well-performing C++. You can find him on twitter @ben_deane.
This blog post has been a long time in the making. I gave a lightning talk on std::exchange
at CppCon 2017; Jonathan first asked me to write something about std::exchange
in January 2019; now here we are in the strange days of the second half of 2020. But although much has changed in the outside world, I would guess that not much has changed in most C++ codebases and in the minds of many C++ programmers with respect to using std::exchange
. It could still do with more publicity and more recognition of potential use cases.
You’re already using something very like std::exchange
I’ll start by making a perhaps surprising assertion: you’re almost certainly already using a construct that is semantically equivalent to std::exchange
.
You don’t have to be using modern C++, because this construct has been in C++ since day one. You don’t even have to be using C++, because this construct was in C, and is therefore in many C-influenced languages. In fact it’s been with us for maybe 50 years or more, ever since Ken Thompson wrote the B programming language.
It’s the humble increment operator. To be precise, the postfix increment operator.
When we write i++
, it’s exactly the same as writing std::exchange(i, i+1)
. We can see this by considering two possible implementations of strcpy
:
char *idiomatic_strcpy(char* dest, const char* src) { while ((*dest++ = *src++)); return dest; } char *exchange_strcpy(char* dest, const char* src) { for (;;) { auto s = std::exchange(src, src+1); auto d = std::exchange(dest, dest+1); *d = *s; if (*d == 0) break; } return dest; }
(code on godbolt here)
And they optimise to exactly the same assembly output [1].
There is even a vestigial clue in C++ that postfix increment is the same as std::exchange
: postfix operator++
takes a dummy int
argument. This differentiates it from its prefix counterpart, but is it just a coincidence?
struct S { constexpr auto& operator++() { ++i; return *this; } constexpr auto operator++(int) { auto ret = *this; ++i; return ret; } int i{}; }; int main() { S s{}; ++s; s++; return s.i; }
We could actually take this further by using the “dummy” argument, and end up with something that’s almost exactly like std::exchange
.
struct S { constexpr auto operator++(int incr) { auto ret = *this; i = incr; return ret; } int i{}; }; int main() { S s{}; s.operator++(17); return s.i; }
I don’t particularly recommend abusing the conventional usage of operator++
like this, but it serves to illustrate the point [2].
Although postfix increment may not be nearly as widespread in a typical codebase as prefix increment, we usually have no problems using it or reasoning about its use where it leads to concise, readable code [3]. And so it should be with std::exchange
.
The “swap-and-iterate” pattern
I’ve found extensive use for std::exchange
wherever I would previously use the “swap-and-iterate” pattern. This pattern occurs a lot in event-driven architectures; one might typically have a vector of events to dispatch or, equivalently, callbacks to invoke. But we want event handlers to be able to produce events of their own for deferred dispatch.
class Dispatcher { // We hold some vector of callables that represents // events to dispatch or actions to take using Callback = /* some callable */; std::vector<Callback> callbacks_; // Anyone can register an event to be dispatched later void defer_event(const Callback& cb) { callbacks_.push_back(cb); } // All events are dispatched when we call process void process() { std::vector<Callback> tmp{}; using std::swap; // the "std::swap" two-step swap(tmp, callbacks_); for (const auto& callback : tmp) { std::invoke(callback); } } };
This is the “swap-and-iterate” pattern. It is safe for the callbacks to call defer_event
and therefore produce events of their own: we use tmp
so that a call to defer_event
does not invalidate the iterator in our loop.
But we’re doing a bit more work here than necessary, and we’re also guilty of incurring the “ITM antipattern” [4]. First, we construct an empty vector (tmp
), then — with swap
— we have 3 move assignments before we get to the business of iterating.
Refactoring with std::exchange
solves these problems:
class Dispatcher { // ... // All events are dispatched when we call process void process() { for (const auto& callback : std::exchange(callbacks_, {}) { std::invoke(callback); } } };
Now we don’t have to declare a temporary. Inside std::exchange
we have one move construction and one move assignment, saving one move compared to swap
. We don’t need to understand the ADL dance involved in the “std::swap
two-step” [5]. We didn’t need tmp
— just a way to express the empty vector, which here is {}
. And the compiler is really good at optimising the call to std::exchange
, so of course we get the copy elision we would normally expect. As a result, the code overall is more concise, faster, and provides the same safety as before.
Posting to another thread
A similar pattern occurs in any multithreaded setting where we want to capturean object in a lambda expression and post it to another thread. std::exchange
allows us to efficiently transfer ownership of an object’s “guts.”
class Dispatcher { // ... void post_event(Callback& cb) { Callback tmp{}; using std::swap; swap(cb, tmp); PostToMainThread([this, cb_ = std::move(tmp)] { callbacks_.push_back(cb_); }); } };
Here we are taking ownership of the passed-in callback by swapping it into a temporary, then capturing that temporary in a lambda closure. We are capturing by move in an attempt to improve performance, but ultimately we’re still doing a lot more than we need to.
class Dispatcher { // ... void post_event(Callback& cb) { PostToMainThread([this, cb_ = std::exchange(cb, {})] { callbacks_.push_back(cb_); }); } };
This gets us exactly what we want — again with more expressive code — and we’re asking the processor to do less. Once more, std::exchange
uses one move fewer then std::swap
, and copy elision, a.k.a. the return value optimization, constructs the return value directly into the lambda expression’s closure.
Why not just move?
But, I hear you ask, why do more than one move at all? Why not something like this?
class Dispatcher { // ... void post_event(Callback& cb) { PostToMainThread([this, cb_ = std::move(cb)] { callbacks_.push_back(cb_); }); } };
The answer is to ensure future maintainability and flexibility. It may well be true that a moved-from Callback
is considered just as empty as if we had explicitly emptied it with std::exchange
, but is that obvious? Is it always going to be true? Will we ever need to update that assumption — or this code — if we change the type of Callback
later on?
In the major STL implementations, it is currently the case that a moved-from container is empty. More specifically, sequenced containers like std::vector
; associative containers like std::unordered_map
; and other “containers” like std::string
or std::function
are empty after move, even when they are small-buffer-optimised [6].
But this is not necessarily true of every single container type we might use. There is no particular reason why a homegrown small-buffer-optimised vector should be empty after we move from it. We find a notable standard counterexample of the “normal” behaviour in std::optional
, which is still engaged after being moved from. So yes, using std::move
— obviously — only incurs one move, whereas std::exchange
incurs two, but at the cost of abstraction leakage. Using only std::move
, we need to know and be able to reason about the move-related properties of the container we use; future maintainers (usually ourselves, in 6 months’ time) also need to know about that “empty after move” constraint on the code, which is not explicitly expressed anywhere and not obvious from inspection.
For this reason, I recommend being explicit about clearing objects that are supposed to be empty, and std::exchange
can do just that. In fact cppreference.com notes a primary use case for std::exchange
in writing the move special member functions to leave the moved-from object cleared.
Can we use std::exchange
with locks?
I want to go back to thinking about multithreaded code, because it may seem at first that std::exchange
isn’t a great option when we need to access something under mutex protection:
class Dispatcher { // ... // All events are dispatched when we call process void process() { std::vector<Callback> tmp{}; { using std::swap; std::scoped_lock lock{mutex_}; swap(tmp, callbacks_); } for (const auto& callback : tmp) { std::invoke(callback); } } };
Here, the vector of callbacks is protected by a mutex
. We cannot afford to hold this lock while iterating, because any event handler that wants to generate an event will try to lock the mutex
in order to queue its event [7].
So we can’t use our std::exchange
pattern naively:
class Dispatcher { // ... // All events are dispatched when we call process void process() { std::scoped_lock lock{mutex_}; for (const auto& callback : std::exchange(callbacks_, {})) { std::invoke(callback); } } };
because that would break our ability to queue events from callbacks. The solution, as is so often the case, is to use a function. In this instance, an immediately-invoked lambda expression fits the bill nicely.
class Dispatcher { // ... // All events are dispatched when we call process void process() { const auto tmp = [&] { std::scoped_lock lock{mutex_}; return std::exchange(callbacks_, {}); }(); for (const auto& callback : tmp) { std::invoke(callback); } } };
We reap the benefits of holding the lock for as short a time as possible; taking advantage of return value optimization; saving a move; and concision of expression.
Were I being deliberately provocative — such as in a lightning talk — I might also suggest the following:
class Dispatcher { // ... // All events are dispatched when we call process void process() { const auto tmp = (std::scoped_lock{mutex_}, std::exchange(callbacks_, {})); for (const auto& callback : tmp) { std::invoke(callback); } } };
Here, the scoped_lock
lives until the semicolon, and the result of the comma operator is the result of std::exchange
, used to construct tmp
. I concede that many people would recoil in horror at this use of the comma operator, but that’s a topic for another article [8].
Consider std::exchange
over std::swap
To sum everything up, I believe that std::exchange
is still underused, and situations where it can be usefully applied are probably under-recognised. Whenever you find yourself writing swap
, consider: do you really need that temporary?
Footnotes
[1]: Yes, I know in real life,strcpy
unfortunately returns a copy of the dest
passed in. It would be more useful — as I wrote here — to return where dest
ends up. I also know that strcpy
is unsafe, but I’m using it as an example.
[2]: I do, however, recommend marking the postfix increment operator [[nodiscard]]
. To my knowledge there is no way to get a warning on any compiler for throwing away the result of a builtin operator++
.
[3]: Most modern style advice prefers prefix increment, using postfix increment only where necessary — which is to say, exactly where we need its “return value,” as we sometimes do.
[4]: Conor Hoekstra expounds on the “ITM” (initialize-then-modify) antipattern in his recent MUC++ talk.
[5]: The “std::swap
two-step” is explained by Arthur O’Dwyer here.
[6]: There are well-thought-out reasons for this. It’s not as simple as “not clearing a small-buffer-optimised std::string
must be cheaper than clearing it”. Ask your local standard library implementer for details.
[7]: We could use a recursive_mutex
to handle locking re-entrancy, but I try to avoid such lazy-thinking solutions. They usually lead to an erosion of code reason-ability.
[8]: This construction may also fall foul of the [[nodiscard]]
attribute which can usefully be applied to lock objects, precisely to prevent immediate unlocking of accidentally-unnamed locks.
Don't want to miss out ? Follow:   Share this post!