How to choose good names in your code
Naming is SO important. If your code is going to be read at least one time — if only by yourself — then names will play a major part in your capacity to work with it. Variable names, function names, class names, names in an interface, all are priceless ways to let your code tell more about what it’s doing. During code review at work I’m quite picky with my team members concerning good naming — sorry about that, lads! — but I believe this can make or break the quality of our code.
Even if there are other means to know what a piece of code is doing, like documentation for instance, good names are an extremely efficient channel to convey information about your code for at least two reasons:
- Very good names instantly tell what the matter is in surrounding code, as opposed to looking up the documentation and finding your way around code by following it,
- Naming can be improved quickly. You can just make a quick fix that updates some names in the code, manually or by using a tool (such as the popular clang-tidy for example), and if your code builds you’re nearly certain it will pass the tests.
This post aims at providing guidelines about how to choose good names. I’ve taken some of these guidelines off the reference book of Steve McConnell Code Complete (if you haven’t read it yet I suggest you stop reading this post, or anything else you’re doing for that matter, and start reading the book 🙂 ). Some others I’ve learned from discussions, suggestions and code reviews with my peers at work. And a couple of them I’ve worked out on my own by trying out different things by reading and writing code over the years.
We’ll start by telling how to avoid bad names, and then focus on how to pick good ones.
Don’t do anything illegal
Let’s get this out of the way, there are names that you are just not allowed to use in C++.
Besides using names reserved by the standard (like “int”) that will halt compilation, some combinations of underscores (_) in a name will compile while not being legal, because they are reserved for the compiler or standard library implementer. Using them may conflict with objects or routines declared by them, leading to subtle bugs and unexpected behaviour.
Here are the names that are reserved for the compiler and standard library implementers:
- any name with two consecutive underscores in it (
__
),
- any name starting with one underscore immediately followed by a capital letter (
_isOk
,isOk_too
,_IsNotOk
),
- a name starting with one underscore and in global namespace.
So don’t consider using such names, as they could get you into trouble.
Don’t waste information
When you think of it, your code perfectly knows what it is doing. In fact it is the one that knows best: it executes what’s in it as faithfully as can possibly be!
Giving good names is really retaining as much of this information as you can. Said differently, it is about not wasting information by obfuscating the code. It’s interesting to note that usually information hiding is encouraged, via encapsulation. But in this context it is rather information disclosing that you want to aim for.
For this reason, limit the use of abbreviations. Abbreviations and acronyms are convenient to write but difficult to read. And the saying goes, code is written once but read many times. Now you don’t have to systematically spell out all acronyms to make code clearer, and some repeated unabbreviated code can even harm readability. For instance it seems reasonable use “VAT” in your code instead of writing valueAddedTax
every time you use it, because everyone knows what VAT is.
How to choose whether or not to use an acronym in code? A good rule of thumb is that if the end-user of your application would understand a particular abbreviation or acronym then it is OK to use it in code, because it shows that everyone in your domain area knows what it means.
Don’t try to optimize for the minimum number of characters. On forums you can see guys that argue that their method is superior because it involves less typing. But what is more hassle, a couple of keystrokes, or a couple of minutes staring at code trying to figure it out?
This is particularly true for functions and methods names, which you can make as long as necessary. Research suggests (Rees 1982) that function and method names can reasonably go up to 35 characters, which really sounds like a lot.
However the length of a function name can also become bloated for bad reasons:
- if a function’s name is too long because the function is doing too many things, the fix to do is not at the name level but rather a the function level itself by breaking it down into several logical parts.
- function names get artificially bloated when they include superfluous information that is already expressed by their parameter types. For instance:
void saveEmployee(Employee const& employee);
can be renamed:
void save(Employee const& employee);
This leads to more natural code at call site:
save(manager);
as opposed to:
saveEmployee(manager);
This goes in the same direction as the Interface Principle and ADL (that concerns removing superfluous namespaces at call site) that will be the subject of a dedicated post.
- Another reason for an name to contain undesirable information is when it contains a negation. The following code:
if (isNotValid(id)) {
can be improved by using an affirmative name:
if (!isValid(id)) {
Now that we’ve ruled out a certain amount of bad naming practices, let’s focus on how to pick good names.
Pick names consistent with abstraction levels
As described in a previous post, respecting levels of abstraction is at the root of many good practices. And one of these practices is good naming.
A good name is a name that is consistent with the level of abstraction of surrounding code. As explained in the post on levels of abstraction this can be said differently: a good name expresses what code is doing, not how it is doing it.
To illustrate this, let’s take the example of a function computing the salaries of all the employees in a company. The function returns a collection of results associating keys (employees) to values (salaries). The imaginary implementer of this code has watched Chandler Carruth’s talk about performance with data structures and decided to forgo the map to take a vector of pairs instead.
A bad function name, that would focus on how the function is implemented would be:
std::vector< pair<EmployeeId, double> > computeSalariesPairVector();
The problem with such a function name is that it expresses that the function computes its results in the form of a vector of pairs, instead on focusing on what it does, that is computing the salaries of the employees. A quick fix for this would be to replace the name with the following:
std::vector< pair<EmployeeId, double> > computeEmployeeSalaries();
This relieves the call site from some implementation details, letting you, as a reader of the code, focus on what the code is intending to do.
Respecting levels of abstraction has an interesting consequence on variables and object names. In many cases in code, variable and objects represent something more abstract than what their type implies.
For example an int
often represents more than just an int
: it can represent the age of a person or the number of elements in a collection. Or a particular object of type Employee
can represent the manager of a team. Or an std::vector<double>
can represent the daily average temperatures observed in New York over the last month. (Of course this doesn’t hold in very low-level code like adding two int
s, or in places where you use strong types).
In such cases you want to name the variable after what it represents rather than after its type. You’d name your int
variable “age”, rather than “i”. You’d name the above Employee
“manager” and not just “employee”. You’d name the vector “temperatures” rather than “doubles” .
This seems quite obvious yet there are at least two cases where we generally neglect to apply this guideline: iterators and templated types.
Although iterators will tend to disappear with the progress of algorithms and range libraries, some will still be needed and many are still around today in code anyway. For instance, let’s take a collection of cash flows paid or received from a financial product. Some of these cash flows are positive, some are negative. We want to retrieve the first cash flow that went towards us, so the first positive one. Here is a first attempt to writing this code:
std::vector<CashFlow> flows = ... auto it = std::find_if(flows.begin(), flows.end(), isPositive); std::cout << "Made " it->getValue() << "$, at last!" << std::endl;
This code uses the name “it”, reflecting how it is implemented (with an iterator), rather than what the variable means. How do you compare this to the following code:
std::vector<CashFlow> flows = ... auto firstPositiveFlow = std::find_if(flows.begin(), flows.end(), isPositive); std::cout << "Made " << firstPositiveFlow->getValue() << "$, at last!" << std::endl;
Which code saved you the most effort understanding it? Can you imagine the difference when you don’t have to read two lines of code but 10 or 50? Note that this ties up with the idea of not wasting the precious information code knows about itself, that we described in the previous section.
The same logic applies to template parameters. Especially when starting out using templates, where most examples we saw came out of academic sources, we have a tendency to write the following line of code for all our template classes and functions:
template <typename T>
while you may know more about T than that it is just a type.
Using T as a type name is fine in very generic code where you don’t know anything about the type, like in std::is_const
:
template<typename T> struct is_const;
But if you know anything about what T represents, this is as much documentation that you can work into your code. We will see more examples about this when we talk about concepts in a dedicated post on Fluent C++, but let’s take here the simple example of a function parsing a serialization input:
template <typename T> T parse(SerializedInput& input) { T result; // ... perform the parsing ... return result; }
And by showing more explicitly what T represents:
template <typename ParsedType> ParsedType parse(SerializedInput& input) { ParsedType result; // ... perform the parsing ... return result; }
Compare the two pieces of code. Which one do you think is easier to work with?
You may think this makes a big difference or you may think it doesn’t. But what is certain is that the second piece of code includes more documentation in it, and for free.
And this true for good naming in general: for once there is a free lunch out there, let’s make a grab for it.
Related articles:
Don't want to miss out ? Follow:   Share this post!