How Compact Code Can Become Buggy Code: Getting Caught By The Order of Evaluations
Code sprawling over multiple lines of code and getting drowned into low-level details is typically hindering expressiveness. But cramming everything into one single statement is not always the right thing to do either.
As an example, here is a buggy code that was spotted and fixed by my colleague Benoît (context has been obfuscated in the code). And thanks Benoît for bringing up such an important subject.
void f(Data const& firstData, int someNumber, std::auto_ptr<Data> secondData); std::auto_ptr<Data> data = ... // initialization of data f(*data, 42, data);
Regardless of the questionable design, and even though this code uses std::auto_ptr
which has been deprecated, the same problem could have been reproduced with an std::unique_ptr
, though a little more explicitly maybe:
void f(Data const& firstData, int someNumber, std::unique_ptr<Data> secondData); std::unique_ptr<Data> data = ... // initialization of data f(*data, 42, move(data));
Can you see what can go wrong in these two pieces of code?
In fact, the behaviour was correct for a time, until it broke. And when it broke, it was only on certain platforms and it continued working on others. No need to say that pinning down the source of the issue wasn’t easy.
Some leeway for optimization
The problem lies in the passing of arguments to the function f
. In C++, the order of evaluation of a function’s arguments is unspecified. Some compilers could decide to evaluate from left to right, others from right to left, and others in a completely different order. This varies from compiler to compiler, and a given compiler can even have different orders of evaluation for different call sites.
In the above case, if the arguments are evaluated from right to left, then *data
is evaluated after the moving of the smart pointer. And moving the smart pointer (or copying it for auto_ptr
), empties it out, leaving a null pointer inside. Accessing *data
then causes undefined behaviour (btw, if you want to read more about smart pointer, there is a whole series of posts dedicated to them on Fluent C++).
On the other hand, if the arguments are evaluated from left to right, then *data
is evaluated before the smart pointer has been moved from, so it is still valid at the moment it is accessed.
The reason why the language gives compilers this liberty (and many others) is to let them make optimizations. Indeed, it could be that rearranging the instructions in a specific order would lead to more efficient assembly code. (While I don’t doubt it is true, I couldn’t find any specific example to illustrate this. Does anyone have one?)
EDIT: As pointed out by Patrice Roy, the unspecified order of evaluation presents another advantage. Fixing an order would leave the possibility to rely on interrelated side effects in the evaluation of the parameters. And this would force us to check inside of the functions what those side effects are in order to understand what the code is doing, which would induce more complexity in the code.
Calls and subcalls
In fact the order of evaluation of arguments can be even more mixed up than the above example.
Consider the following example taken from Item 17 of Scott Meyers’ Effective C++:
int priority(); void processWidget(std::shared_pointer<Widget> pw, int priority); processWidget(std::shared_ptr<Widget>(new Widget), priority());
(I have taken the liberty to use std::shared_ptr here instead of the book’s tr1 component used before C++11 – but the meaning stays unchanged)
The order of evaluation of all the parameters is not specified. And even the parameters in the subcalls to the function call. For example, the compiler could generate code that follows this order:
- call
new Widget
, - call
priority
, - call the constructor of
std::shared_ptr
!
And if the call to priority
throws an exception, the Widget
will leak because it hasn’t been stored into the shared pointer yet. For this reason, Scott Meyers advises to store new
ed objects in smart pointers in standalone statements. But even this wouldn’t fix the code at the beginning.
Striking a balance
Leaving some room to the compiler to make optimizations is certainly a good thing, but too much liberty creates a risk of programs not believing the way a programmer would think they would. For this reason, some rules are necessary to strike a balance between optimization and easiness of use for the developer.
Some rules have always been there in C++, and even in C. For example calling &&
, ||
or ,
on two booleans always evaluates the left hand side first, and (if necessary) the right hand side afterwards.
Some codes actually relies on this, for example:
void f(const int * pointer) { if (pointer && *pointer != 0) { ...
In this code, the pointer is suspected to be null, so it is checked before being dereferenced (whether this is a good practice or not is debatable, but it is another debate). This code relies on the fact that pointer
will always occur before *pointer != 0
. Otherwise the purpose of performing the check at all would be defeated.
By the way, for this reason Scott Meyers advises against overloading operator&&
, operator||
and operator,
on custom types, so that they keep a behaviour consistent with native types (see Item 7 of More Effective C++).
Also, in the expression
a ? b : c
a
is, quite naturally, required to evaluate before b
and c
.
More rules with Modern C++
C++11, C++14 and C++17 have added more rules to fix the order of the evaluation of various subparts of an expression. However, the order of evaluation of a function’s parameters still remains unspecified. It was considered to fix it, but this proposition was finally rejected.
You may wonder what has been added then. In fact there are a lot of cases where the relative order of evaluation could matter. Take the simple example of calling a function with only one argument. The function itself may be the result of an evaluation. For example:
struct FunctionObject { FunctionObject() { /* Code #1 */ } void operator()(int value) {} }; int argument() { /* Code #2 */ } // Main call FunctionObject()(argument());
Before C++17, the relative order between Code #1
and Code #2
was unspecified. And C++17 changes this by ensuring that the determination of the function to call occurs before the evaluation of its arguments. In fact modern C++ adds quite a few new rules, that can be found here.
Keep an eye out
As a closing note, I think that one has to be wary of compressed code that use interdependent arguments, and avoid using it when possible. Indeed, some innocuous code can turn out to be the source of a hard-to-diagnose bug. For instance, in the following line of code:
a[i] = i++;
the behaviour is undefined before C++17. Not even unspecified, undefined. This means that the outcomes are not limited to the various possible orders of evaluation. The outcome can be anything, including a immediate (or later) crash of the application. Indeed, it is only in C++17 that the evaluation of the right hand side of an assignement is required to occur before the one of the left hand side.
With the increased rhythm of the evolution of the language, we are likely to have compiler upgrades much more often than before, each time risking to change the way the code is generated and optimized. Let’s be wary of this sort of cleverness in code.
Don't want to miss out ? Follow:   Share this post!