PDF Version
By Brian Jemes
networkmade.com
Kevin was a dinosaur and didn't know it. He was an expert, but the system he knew was on the way out. It might be mission critical today, but it would be a legacy system in three years. Brian was new and inexperienced, but he was working on the new system. He still had a lot to learn, but he was working on the right technology at the right time.
One day, Brian got a trouble report that he hadn't seen before and didn't know where to begin. His team leader was out of the office, so he asked Kevin for help. Immediately, Kevin asked Brian five questions that would guide him to the solution. In fact, these questions would guide him to solutions for the rest of his career.
Kevin, the dinosaur, knew something important that Brian didn't. Brian knew a lot of details about the new system, but he didn't know some fundamentals of troubleshooting. And when it comes to fixing problems, it is more important to know the fundamentals than the specifics.
Looking back, I can't remember the five questions Kevin asked me that day. I do remember that the questions were general, and that I hadn't asked any of them. Once I did, I didn't need to go back to Kevin, except to say "thank you." The answers to those questions gave me the contextual "traction" I needed to resolve the problem. If asked for my top five troubleshooting questions today, I'd suggest these:
These diagnostic questions help classify the problem. Classifying the problem reduces the potential problem space. A reduced problem space speeds time to resolution.
Let's examine how each of these questions help classify the problem and focus the search for resolution.
Did it ever work before?"
Many times a customer will report a problem but neglect to mention that they are trying something new. Knowing this answer is vital context for the support engineer. In general, if something was working, it is likely that a single change in state caused the problem. If something new is being attempted, there are likely many problems, so it is advisable to start with a review of all prerequisites for the new capability. In some cases, the answer will be that it isn't technically possible with the current equipment, service, or licensing.
When did it break?
This answer will help focus the search for clues in error logs, change logs, and monitoring tools. Even if the logs and reports don't reveal a direct root cause, they often record symptoms that will help further narrow the search criteria.
Were any changes made recently?
Did the user environment change? It may be that a change in the customer environment caused this problem, e.g. a password change or a software upgrade.
Caution: Often customer environment changes have nothing to do with the problem.
Has this happened before?
If nothing has changed with an existing service, new problems are rare. This means that most problems have been seen before. If the customer has seen the problem before, they may even know how it was fixed the last time. Whatever they can tell you will help narrow the search criteria for identifying a solution.
Caution: From the customer perspective, two problems with two very different causes might have the same symptoms.
Does this need to be fixed right away?
Give the customer a chance to set the priority. It may be difficult for the support engineer to determine the business impact of a problem. If escalation is necessary, it is best to initiate it as soon as possible. If problem resolution can wait, it may be more efficient to save investigation for a slow period or the availability of an expert. In some cases, the problem may be a "corner case" that never needs to be resolved.
To stay relevant in the IT industry an engineer must continually learn new technologies, but things of lasting relevance are often learned from the technology "dinosaurs" in our midst. Listen to them. They may know something important that will help improve your performance and advance your career.
April 2020 A Dinosaur’s Guide To Network Troubleshooting networkmade.com