A Case Study for the Quickest Way to Find the Source of a Bug
In the previous post, I went through the most efficient method I know to find the source of a bug. Here it is, summed up:
0) Don’t start by looking at the code
1) Reproduce the bug
2) Do differential testing
__2a) Start with a tiny difference
__2b) Continue with larger differences
3) Formulate hypotheses and check them in the code
As laying out the method can look a little abstract, we’re going to go through a case study where we identify the cause of a bug in a concrete example.
It’s inspired from a bug that I’ve seen once in an application. I’ve simplified the domain and the environment to present it more synthetically, but the mechanics of the bug is all there.
So here’s the story: you’re a software developer working for the International School of Harmonica, which is a thriving establishment that delivers harmonica lessons to people around the world, that want to be able to master the subtleties of this musical instrument.
As a hat tip to some of my C++ blogging buddies, we’ll say that the International School of Harmonica has sites in
- Hamburg, Germany
- Aachen, Germany
- Rottenburg, Germany
- Edinburgh, Scotland
- Cracow, Poland
- and Paris, France
Seems that a lot of folks in Germany like to play the harmonica then.
As a software developer for the International School of Harmonica, you have to maintain a large system that tracks what’s going on in the school. And one day, you get a bug report. It’s in the module that deals with lessons subscriptions.
Let’s see how that feature works (it’s simple) and the bug report. Then we’ll apply the above method to find the source of the bug as quickly as possible.
Lessons subscriptions
When a student subscribes for harmonica lessons, the school inserts a subscription via a form in the system. The form looks like this:
It contains the name of the student, the name of the school (which we’ll identify with its city for simplicity here), and a button “Lesson dates…” that leads to the calendar of lessons that this student is subscribed to. Let’s click on that “Lesson date…” button. We see the following screen open:
The left hand side of this screen is taken up by the lesson schedule: these are the dates where the student is supposed to show up and learn how to express a myriad of emotions with their harmonica for an incredibly intense hour. For simplicity we leave out the time in the day of the lessons here.
The user can fill out the schedule manually, or they can use the right hand side of the screen to generate dates automatically:
For simplicity’s sake we assume that lessons are always weekly. Note that the 7th of April is Good Friday in Germany in 2034, but since the configuration of the right hand side of the screen says that a lesson falling on a public holiday should be held the “day before” instead, the second date is the 6th of April.
That’s it for the feature. Let’s now have a look at that bug report.
The bug report
The bug report goes like this:
When we duplicate an existing lesson subscription and select another school of the newly created subscription, we observe that the lesson dates disappear.
But we expect the duplicate to make a carbon copy of the subscription, which means also copying the dates.
Note that if we only duplicate the subscription without changing the country, then the lesson dates remain.
Let’s now apply the above method to find the source of that bug without trudging with pain into the code.
Let’s find the source of that bug, quickly
As a reminder, the method we’ll follow is this:
0) Don’t start by looking at the code
1) Reproduce the bug
2) Do differential testing
__2a) Start with a tiny difference
__2b) Continue with larger differences
3) Formulate hypotheses and check them in the code
Step #0: Don’t start by looking at the code
Let’s go ahead and do 0) Don’t start by looking at the code.
That’s the easiest one, we don’t do anything. Ok, done.
Now let’s do 1) Reproduce the bug.
Step #1: Reproduce the bug
The test case contains a lesson subscription, let’s see what’s in it:
And the lessons dates look like this:
Those are entered manually and don’t use the automatic generation.
Now we duplicate the subscription (say there is a UI action to perform that), give the duplicate a new name and change its country:
Let’s now open the dates:
The dates are gone! Excellent, we reproduce the issue. At this point, we can rejoice since the rest is only a matter of time.
Indeed, this is really a great position because things get harder when you don’t reproduce the issue. Indeed, so many kind of things can have gone wrong in that case (configuration of your dev environment, wrong version in the version control system, misunderstanding of the test case, the test case can only be reproduced once in a given environment and you need to find a backup of the DB to restore… lovely stuff).
Let’s assume that the issue is confined now. Let’s hunt it down with 2) Do differential testing.
Step #2: Perform differential testing
The bug report says that the issue happened when duplicating the lesson subscription. Is it specific to duplicates, or can it happen by simply inserting a subscription from scratch? The only way to know is to test it.
So let’s insert a new subscription:
Let’s fill out some dates:
Now let’s go back and change the country:
And reopen he dates:
Gone.
So the issue has in fact nothing to do with the duplication. This is important because it represents a whole chunk of code we won’t have to look at, because we’re now pretty sure it doesn’t contain the source of the issue. Had we started with the code we may have debugged the duplication, which would have been a complete waste of time. Hence the interest of Step #0 above.
To be even more sure, we can try to change the country of an existing subscription. We won’t get into the mockups for that test here, but it turns out that the bug is reproduced in this case too (it was in the story).
So the bug happens when we change the country and open the dates screen.
But are those two steps really necessary to reproduce the bug?
To check, we’re going to do each of them separately and see if we reproduce the bug in each case. Let’s start by changing the country without opening the dates.
To do this, we pick up the subscription from the test case:
We change its country:
And we save it. Note that we didn’t open the dates screen.
Now let’s reopen the subscription and click to open the dates screen:
The dates are there, the bug is not reproduced, so it was necessary to open the dates screen right after changing the country. Opening the dates screen then flushes the dates.
But then, do we really need to change the country? Yes, because when we open a subscription and directly open the dates, we see that the dates are here, so the bug is not reproduced then. We saw that in the initial presentation of the feature.
We can deduce that opening the dates screen flushes the dates, but only if we’ve changed the country beforehand.
Now the question is: why? What’s going on when we perform those two actions in a row? It’s time to 3) Formulate hypotheses and check them in the code.
Step #3: Formulate hypotheses and check them in the code
Let’s think: what is the link between a country and some dates? The first answer that comes to mind is public holidays. Indeed, each country has its public holidays.
To validate this hypothesis, we won’t even have to look in the code. Looking in the code is typically slower than looking in the application, so let’s save it for when there is nothing else we can do.
Different countries have different public holidays, but different cities in the same country have the same public holidays. Let’s try to change the city without changing the country and see if we reproduce the issue.
We start again with the subscription of the test case:
Note that we use the minimal test case that we obtained with differential testing. In particular, no need to go through duplication. So we select another city in Germany:
And open the dates screen:
The dates are still there! The bug is not reproduced when we change city, only when we change country. This raises the probability that the bug is somehow related to public holidays.
The other feature that is related to public holidays is the automatic generation of dates. Let’s see if we reproduce the issue with the generation parameters filled out.
So we start again from the lesson subscription of the test case:
But this time we fill the generation parameters:
Now let’s go back and change the country:
And reopen the dates screen:
The dates are there, but not exactly the same. Contrary to Germany, Poland doesn’t have the 7th of April as a public holiday in 2034.
We can deduce that opening the dates screen is working out the dates, based on the country and on the generation parameters.
We can now formulate a hypothesis about the source of the bug: when we open the dates screen, the system tries to work out the generated dates if the country has changed. And something is going wrong when there is no generation parameters.
Now we can check this hypothesis in the code, and there is just a couple of lines that can confirm of infirm the hypothesis. We go check that targeted portion of code, and it takes only a few minutes to realize that the system is trying to generate with an empty generator, which gives an empty set of dates, and it uses this empty set regardless.
The source of the bug has been identified.
Now we should think of a fix, but that’s another story, that uses another method.
The more time you spend in the application, the less time you spend in total
Even if I’ve never worked for a school of harmonica, the bug I had seen in real life looked essentially like this one, and all the above reasoning is very close to how the analysis went down. With this method, we could diagnose the bug in a matter of minutes.
When you do maintenance, don’t start by looking at the code. Rather, play around with the application and reason about what test can help you narrow down the source of the issue. You’ll save a lot of time and frustration in your life as a software developer.
You may also like
Software maintenance can be fun – The quickest way to find the source of a bug
Don't want to miss out ? Follow:   Share this post!