The Common Vocabulary of Software Developers
Writing expressive code is putting together code that conveys our intents, for other people to understand them. And the thing with code is that it tends to lasts.
So what you write today in your codebase is like a letter that you address to people living in the future. Those people include all the developers that will come and go on this project: your teammates, your future self, and even some young sprouts now at university and who your company will recruit tomorrow. So you need to pick your words very, very carefully.
When you hold that virtual plume, what words can you use to make yourself understood?
Those words define the common vocabulary of the software developers that will read your code.
Some words obviously fall into that common vocabulary: for instance, the keywords of the language such as if
, const
or int
, you can use with the assurance of being understood. Similarly, words that the rest of the world would understand, such as “remove” or “exit” are in it too.
But there are a lot of words in between, and I think it is important to agree on the common vocabulary that we are allowed to use to express our intentions in code.
Let’s start with the motivating example that initially draw me into this consideration: abbreviations. But pulling the thread from there unrolls a topic much deeper than it seems initially.
Abrvtns are not OK. Or are they?
Abbreviations in code is most of the time seen as a bad thing, and most of the time for good reason.
Shortening code by taking away the words’ letters is a cheap economy, making the codeline look like a giant newspaper ad that makes everyone’s eye water when they read it.
For sure, it goes faster to write less letters. But even if every line of code is written once, the number of times it is read is much, much higher. And abbreviations take more energy for a reader to decipher, so in the long run they end up wasting everyone’s time.
But are all abbreviations forbidden?
The answer to this question is No (you saw it coming, didn’t you?). Some abbreviations are OK, and some even clarify the code. There is a rule of thumb I like, to define which abbreviations are good to use: those that an end-user of the application would understand.
Here is an example: for the users of a market finance application, the abbreviation “FX”, which stands for Foreign Exchange (the place where you trade currencies), is pretty ubiquitous. So much so, that it would be weird to read “foreignExchange” instead of FX. So this abbreviation is OK in the codeline of a market finance application.
If you think about it a minute, you’ll probably come up with a few abbreviations that go without saying for your application too. Those are part of the common vocabulary of the software developers of your codebase.
Let’s go further: does the common vocabulary contain other abbreviations, that end-users wouldn’t understand? Does the common vocabulary contain other terms (not abbreviations) that an end-user understands but that the rest of the world don’t?
The answer to both those questions is Yes, and this leads us to the next two components of the common vocabulary: algorithms and data structures, and DDD’s ubiquitous language.
The ubiquitous language, a dialect of the common vocabulary
The ubiquitous language is a notion that originated (I think) in Bill Evans’s book Domain Driven Design (reading it was in my summer projects).
In short: to design software well, DDD advocates an intense collaboration between developers and domain experts, where they build together a model of the domain. The model is implemented in code by developers. Together, the terms used in the model form an ubiquitious language.
It is called ubiquitous because domain experts use it to talk among themselves, developers use it to talk among themselves, domain experts and developers use it to talk together, and it is present in the code too.
Using the ubiquitous language in the codeline makes the code benefits from the clear definitions of the terms that everyone agreed upon, including the developers of the projets. So it is clearly part of the common vocabulary that developers are allowed (and encouraged) to use in code, even if someone outside of the project wouldn’t understand them.
Algorithms and data structures, the language of grown-ups
Are there abbreviations that are OK to use in code even if an end-user wouldn’t understand them?
Consider BFS, standing for breadth-first search. BFS is a way to traverse a graph starting from a node inside the graph. It goes like this: visit the first node. Then successively visit all the nodes directly connected to it. Then successively visit all the node connected to those nodes. And so on.
BFS makes concentric traversals of a graph and is opposed to DFS (depth-first search) that follows paths along the graph, and which is also an ubiquitous abbreviation in the world of software developers.
Similarly, all other classical algorithms are part of the common vocabulary. And the classical data structures too: map, set, array, heap, rope, tree, trie, graph, and so on.
But what if someone doesn’t know them? I occasionally meet developers, especially younger ones, that are not yet fluent in the jargon of algorithms and data structures. Does this mean that trie, BFS and DFS are not part of the common vocabulary?
They are. Algorithms and data structures require work to know them, but that’s a necessary investment. I argue that that we should all level up to them, rather than refrain from using them by fear of people not understanding them. Algorithms and data structures are packaged to simplify the code and rise abstraction, after all.
Now not everyone can interrupt their life for several days and devote it to learning algorithms and data structures. Rather, a more realistic (and more fun!) approach is to learn them the first time you come across them in code.
It’s like learning the vocabulary of a human language, really. For instance, one day you come across the word “whites” while reading a recipe. You think it’s a typo, then realize it’s not, look it up on the Internet, spend a minute reading its definition and seeing pictures, and move on enriched by this knowledge.
Similarly, one day you encounter the term “trie” in code. You think it’s a typo, then realize it’s not, look it up on the Internet, spend an half an hour reading the definition and seeing schemas, and move on, enriched by this knowledge.
At some point in life we become able to separate the yolk from the whites. And to master our data structures. It’s the process of growing up.
The standard library is part of the common vocabulary
It is well known that we should know our STL algorithms. Even the less mainstream bits of the STL such as std::is_heap_until
or std::transform_exclusive_scan
are parts of the common vocabulary. We can use them without fear of people not knowing them yet.
But the C++ standard library contains a lot of things outside of the STL too. And like for data structures, a practical approach is to study them along with the code that we encounter in your everyday life.
Miscellaneous components of the common vocabulary
Last time I was at the Software Crafters meetup, I asked around to software developers of a wide variety of languages what they thought was in the common vocabulary (the whole meetup revolves around discussing among sofware developers, if you’re around Paris you should really come).
Here are some of the proposals we were able to collect.
Units
Instead of writing “seconds” you’re free to write “s” in code (like std::chrono
does in C++14). And so on for the common units.
The question was rised about the orders of magnitude for units: “m” means milli, as in “ms” for milli-seconds (10-3). But “M” means Mega (106). Should we write them out explicitly, or is it clear in code that “mJ” is milli-Joule and “MJ” is Mega-Joule?
ISO codes
ISO codes are part of international standard, that makes them fairly standard for everyone. So country codes such as FR or GB are probably understandable by everyone.
Technical acronyms
Even if you’re not a web developer, you will instantly recognize a set of technical acronyms such as HTTP, HTML, and pretty much all developers are familiar with “stdin” and “stdout”.
More generally, some words are known to everyone in the tech industry. For instance OS is unambiguous and there is no need to write out “operatingSystem” to make yourself understood in code.
Some technical acronyms are specific to a type of language, for instance for us who get the blessing of working with pointers, the abbreviation “ptr” is instantly recognizable (for better or worse).
Common abbreviations
Abbreviations that the rest of the world knows such as VAT don’t need further explicitation, provided the codebase is in English (this point rose the question of what should be the language of a given codebase, which is another topic).
Maths names
In code that implements mathematical formulae, some symbols have implicit meaning. x
can mean “value”, or in a graph “abscissa”, n
means integral, and cos
, sqrt
and atan
are part of the common vocabuarly too.
Alice and Bob
Alice and Bob are common names to represent personas. They originated in cryptography, as in “Alice wants to send a message to Bob”, but today they are used pretty widely as personas.
In test code for example, if you need to instantiate objects representing users, you can name them Alice and Bob and everyone will know that they represent any people.
What else do you include in the common vocabulary?
Agreeing on a common vocabulary needs to be a discussion, by essence.
Are the above part of your common vocabulary? What other terms do you use to make yourself understood in code?
Related articles:
Don't want to miss out ? Follow:   Share this post!