Inserting several elements into an STL container efficiently
A couple of weeks ago, I had the chance to hear some feedback from Stephan T. Lavavej about the STL learning resource on Fluent C++. This was done during an episode of CppCast (the podcast for C++ developers by C++ developers) where he was a guest.
Stephan said that he found it overall pretty good (yay!) but he had a couple of quibbles about how some of the aspects of the STL were presented. And when you’re lucky enough to have a quibble about the STL coming directly from Stephan T. Lavavej, you want to make the most of this piece of feedback.
Here I want to flesh out one these pieces of advice that he gave. It concerns how to insert several elements into a container.
Inserting several elements into an STL container
In the posts concerning the STL, I use output iterators such as std::back_inserter
quite intensely. While such iterators are very handy, it is important to realize that in some cases you don’t want to use them.
And these cases come down to inserting several consecutive elements into an STL container.
The – suboptimal – way to inserting several elements by using std::back_inserter
is to couple it with std::copy
:
std::vector<int> v; std::vector<int> newElements = {1, 3, 4, 2, -7, 8}; std::copy(begin(newElements), end(newElements), std::back_inserter(v));
Here std::copy
successively passes each of the elements in newElements
to the output iterator, that adds them to v
by calling its push_back
method. And this does the job: after the execution of std::copy
, all the elements from newElements
have been effectively copied into v
.
The problem here is that even though before calling std::copy
the entire collection of elements is already known (and in particular we know how many of them there are) this piece of information was discarded. Rather, we repeatedly push back into the vector v
, just as if we were discovering each time that there was yet another element to append. This potentially causes multiple reallocations of the vector.
Knowing in advance how many elements are going to be added can be exploited by the vector. This lets it minimize the number of reallocations during the operation: it would reallocate once and for all before starting the operation, instead of reallocating several times along the multiple unitary calls to push_back
.
So, how can we benefit from this information while inserting into a vector? Just by using the range insertion methods.
At initialization of the vector, use the range constructor:
std::vector<int> v{begin(newElements), end(newElements)};
For appending several new elements to an existing vector:
v.insert(end(v), begin(newElements), end(newElements));
Note that these methods also exist for the other STL containers, in particular std::set
and std::map
.
Finally, to replace the entire contents of a vector with newElements:
v.assign(begin(newElements), end(newElements));
After the execution of assign
, all the previous elements have been replaced by the new ones, regardless of the respective numbers of new and old elements. But for a reason that I didn’t quite understand, the assign
method does not exist for associative containers such as std::set
and std::map
.
Is std::copy
useful at all then?
By all means, yes.
In the above case, std::copy
was not appropriate because it blindly extended the size of the container. But sometimes, we don’t extend the size of the container, or we can’t know in advance how many elements are to be added.
For instance, if the container already has values, and we want to overwrite them starting at the beginning, we would use std::copy
:
std::vector<int> v = {5, 5, 5, 5, 5, 5, 5, 5, 5, 5}; std::vector<int> newElements = {1, 2, 3}; std::copy(begin(newElements), end(newElements), begin(v)); // v now contains {1, 2, 3, 5, 5, 5, 5, 5, 5, 5};
Of course, v
has to be bigger than newElements
for this to work.
Another example is writing into a C array:
int a[10] = {}; std::vector<int> newElements = {1, 2, 3}; std::copy(begin(newElements), end(newElements), std::begin(a)); // a now contains {1, 2, 3, 0, 0, 0, 0, 0, 0, 0};
And we will see an example of a case where we can’t know in advance how many elements are to be added, when we address stream iterators, in a dedicated post.
Is std::back_inserter
useful at all then?
Yes again!
It is typically useful for adding into a container the result of any algorithm that does more than std::copy
. For example std::copy_if
:
std::vector<int> v; std::vector<int> newElements = {1, 3, 2, 4, 3, 2, 2}; std::copy_if(begin(newElements), end(newElements), std::back_inserter(v), [](int i){return i % 2 == 0;});
Here we don’t directly have a range to insert into the destination container, therefore we can’t use the range insertion methods.
However, if we know how many elements are going to be inserted, we can do a reserve before the insertion, so that the vector doesn’t have a reallocate multiple times during insertions. In this case we would need to perform a count_if beforehand though. This may or may not be be overkill depending on whether this code is proven to be a performance bottleneck.
In conclusion, to insert several elements into a container, use the container methods whenever you can. This really ties in with a similar guideline we saw when searching into a container with the STL.
Don't want to miss out ? Follow:   Share this post!