Team MAGIC, or “The Magicians” (as they are known within Alley) found themselves with a rare opportunity to link their in-person retreat with the American Alliance of Museums yearly conference in New Orleans. Team Magic has worked on many museum projects and a few members of the team were speaking at the conference last year. So, instead of choosing a random city somewhere in the continental United States of America, the Magicians set their sites on the birthplace of jazz.
Programming involves more than transforming requirements into logical steps. It demands a few elusive skills such as unconventional thinking, patience, discipline, attention to detail and intuition. So, when troubles show up in your next project — and they will sooner or later — you need to be more than a software engineer. You’ve got to be a software detective.
This is not about the every day bugs. I’m talking about very complicated problems here, and for lack of a better name, I’ll call them the “tricky ones.” Experienced software engineers will recognize the tricky ones as the issues nobody wants to touch (“I’ll take care of this one later, someday”) –– the tickets they try to keep at the bottom of the list while clients move them to the top, or yes, the “forever unassigned tasks.”
But eventually somebody has to take care of the tricky ones and that, my friend, could be you.
If you are the Sherlock type –– and not everybody is –– once you start working on a tricky problem you won’t be able to stop. I can’t sleep well or enjoy my food when I’m in the midst of solving a case.
The Call: “Inspector, You Have to Come See This”
A shiny day has just started when they contact you — a new problem has arisen. “Would you take a look? We’re sure it’s not a big deal,” they lie. After closer examination you see the fingerprints of a tricky one all over the place. It is time to start work on the case, thus you approach your most important witness: the user.
The user is the person who spends their day with your software. She could be an editor working for your client, let’s say, a major digital publisher, or he could be good old John Doe, the average visitor, just browsing around your client’s site.
Understanding everything about the problem is vital and the user is the best person to help you with that. You may need to resort to a phone conversation, video conference (ideally sharing screens) or even a face-to-face meeting if email or issue trackers aren’t cutting it. The goal is to make it easier for the user to explain. Remember: great software engineers are excellent communicators and are able to help others to express and clarify their ideas.
In many cases, like Mr. Doe’s above, the user is difficult or impossible to contact and you may need to do your best to see the problem happen by yourself, which I’ll discuss next.
You need to not only understand how and when a problem happens but also what’s the goal of a feature and why it needs to be implemented in a certain way. Sometimes users and programmers get stuck with one approach for no valid reason and the solution involves finding a different way of doing things. Be pragmatic.
Once you have a clear picture of the situation, take some time to ponder about it and then share your thoughts with the user. This is critical to avoid wasting precious time on the wrong path. Don’t get too technical — simplify — and succinctly explain why the problem happens and how you think you may fix it. Point out that these are just initial assumptions and that things may change as you move on with the investigation. Keep the conversation going as you make progress –– this will help you and the user to come up with new ideas. Too many users remember a key detail at the last moment.
Going Sherlock involves three stages: reproducing the problem, thinking about the problem and solving the problem. Rinse and repeat. I’ll discuss reproducing here and leave thinking and solving for parts two and three.
The Scene: Reproducing the Problem
Reproducing is programmer’s lingo for “seeing the problem happen.” Obvious tip: don’t tell the user “I can’t reproduce,” just say “I can’t see the problem happen.”
It’s practically impossible to fix a problem you can’t see; hence, start by creating a list of steps to trigger it. Your conversations with the user will pay off at this stage. Some problems are intermittent and you will need to pay closer attention to specific details such as the environment, the way the system is being used, and even the time of the day.
The first step to recreating the scene in which a problem occurs is to ensure that your development setup is as close as possible to the production one –– the one users interact with. These are some of the aspects from the production environment worth mimicking in development:
- The latest code and data.
- The same pieces of software — including versions, e.g., operating system, web servers, database systems, caching, messaging.
- Load balancing. Varnish and HAProxy are relatively easy to setup and work with just a couple of servers or even virtual machines.
- Traffic. You can simulate users hitting a web site with curl and Apache’s ab or, for more complex interactions, JMeter.
Next, follow the steps to recreate the problem under different scenarios. You can do it manually for simple interactions but to be more efficient and facilitate more complex cases, automate with a tool like Selenium. Run your automated tests, even if you haven’t written any for the case related to this problem yet, and pay attention to error messages on screen and in log files.
Finally, try to think and behave like a user would, while keeping your detective senses alert. Look for patterns and jot down as many details as you can. They will be an invaluable source of clues.
Congratulations, Sherlock, you’ve swept the scene and are ready for the next phase: thinking, which I’ll cover in part two.