Traditional Debugging

Characteristic features of traditional debuggers

Most IDEs for Object-Oriented programming provide the same style of debugging facilities:

Controlled execution

The programmer can place breakpoints in the code at specific locations to cause the execution of the software to be suspended when the location is reached (but before it is executed). It is usually possible to specify triggering conditions for a breakpoint, like a hit count, or conditions on the value of certain variables.

Once the execution is suspended, one can perform step by step execution. The most common kinds of steps are:

Step into: if the instruction at the suspended location is a method call, then execution continues up to the first instruction of the called method.
Step over: in the same situation as above, execution continues until the called method returns
Step out: execution continues until the current method returns

Some IDEs also provide the ability to manipulate the call stack: they permit to restart the execution of the current method, or of any method in the call stack, providing a way to get “back in time” (for instance in Eclipse this feature is called “drop to frame”). Unfortunately, it only rewinds the call stack and doesn't undo any side effect of the code that has already been executed.

Memory inspection

Whenever the execution is suspended, the programmer can inspect the values of the variables of the current scope, as well as variables that aren't in the current scope but that were registered for inspection during a previous suspend.

Memory inspection in Eclipse 3.1

The previous screenshot shows how Eclipse presents in-scope variables when execution is suspended within a method. The left pane shows the code of the current method. The location at which execution is suspended is highlighted in green. The values of the variables are displayed in the right pane. Variables are presented in a tree in which scalar values are leaves, and objects are nodes below which there is one child per field of the object. Other traditional debuggers present the information in a very similar way.

The issues of traditional debugging

Fixing a bug is a two-step process:

Identification of the instruction(s) that cause the bug. This is where the debugger can be used. There are other techniques such as strategic print statements in the code.
Modification of the code so that the bug disappears. The debugger is of no help here.

This section illustrates the fact that both controlled execution and hierarchical memory inspection are simply inadequate for debugging even simple programs.

Debugging using controlled execution is not efficient

With a traditional debugger, locating faulty instructions can be very tedious. The usual scheme is to put a breakpoint at a location that is executed before the bug occurs, and then stepping through the code until the misbehaviour is found. The issue is that even in simple systems, the faulty instructions can be buried deep down the call hierarchy. Consider the following Java example:

public class ObjectX{
    private Info info = new Info("name");
 
    public String toString(){
        updateInfo();
        return "ObjectX: " + info.getName(); // NPE thrown here
    }
 
    private void updateInfo(){
        update1();
        update2();
        update3();
    }
 
    // Here goes the definitions of update1, update2 and update2
}

At runtime, the toString() method gets called and throws a NullPointerException (line shown above). This means that the value of the info field was null when invoking the getName() method. So the programmer places a breakpoint on the first line of toString() and runs the program. When execution stops, he inspects the current object and finds that at this point the info field has a non-null value. The programmer thus infers that the updateInfo() method sets the field to null, so he steps into the call. The updateInfo() method calls three more methods, all of which are quite large. The programmer doesn't know which of those methods sets the field to null. Here he has two possibilities:

Step into all three methods, and the methods they call, and the methods called by the methods they call, etc. until he finds the instruction that sets the info field to null. There are several problems with this approach:
- The number of steps to perform can be huge
- Many of these steps are useless, for instance if the faulty instruction is in update2() (or in a method it calls), all the steps performed during the execution of update1() could have been avoided. But the programmer can't know it beforehand.
Step over the three methods, inspecting the value of info at each step. When the field goes null, he knows which method was the culprit so he can stop execution, put a breakpoint in the culprit method, restart execution and repeat the process. There are also several problems with this approach:
- The programmer must restart the execution of his program many times. If the bug only occurs after a long time, or after a lengthy user interaction, this can be very time-consuming.
- The programmer must repeatedly set and remove breakpoints.

A programmer commonly uses a combination of both approaches, which are both tedious, time-consuming and error-prone. Moreover debugging the program can have undesired side effects, for instance if the program writes to the filesystem, accesses a database, performs low-level IO, etc.

Additionally, several factors can makes the problem even more complex:

The presence of loops and recursion renders unconditional breakpoints useless. The programmer can define triggering conditions on his breakpoints, but this is not always practical, or even possible.
If the bug is in user interaction or windowing code, suspending execution and switching between the IDE's window and the program's window can prevent faulty instructions to be reached.
If the bug is concurrency-related, suspending execution, or the mere fact of running the program in debug mode, can change the way threads are scheduled for execution and make the bug artificially disappear.

Hierarchical memory inspection is not efficient

Orthogonally to the difficulty of driving the code flow to the faulty instruction depicted above, there is the problem of understanding a program's data structures. A simple illustration of the inadequacy of hierarchical view of objects is the linked list, which a traditional debugger would show in the following way:

list = LinkedList
- first = ListElement
  - value = String “xyz”
  - next = ListElement
    - value = String “abcd”
    - next = ListElement
      - value = String “123”
      - next = null

And this is only for a 3-elements list! Fortunately modern traditional debuggers have a set of built-in presentations for common data structures like lists, maps, etc. For instance Eclipse can, if the user so decides, represent the previous list in this way:

list = LinkedList
- [0]: String “xyz”
- [1]: String “abcd”
- [2]: String “123”

Eclipse also lets the user define his own presentation for custom types. For instance a tree structure that would be represented as follows…

tree = Tree
- root = TreeNode
  - value = String “Animals”
  - children = LinkedList
    - [0]: TreeNode
      - value = “Mamals”
      - children = LinkedList
        
        [0]: TreeNode
        
        value = “Dogs”
        
        children = null
        
        [1]: TreeNode
        
        value = “Cats”
        
        children = null
        
        [2]: TreeNode
        
        value = “Horses”
        
        children = null
    - [1]: TreeNode
      - value = “Reptiles”
      - children = LinkedList
        
        [0]: TreeNode
        
        value = “Crocodiles”
        
        children = null
      - [1]: TreeNode
        
        value = “Lizards”
        
        children = null

… can be represented in a more natural way:

tree = Tree
- “Animals”
  - “Mamals”
    - “Dogs”
    - “Cats”
    - “Horses”
- “Reptiles”
  - “Crocodiles”
  - “Lizards”

However the programmer cannot define presentations that do not map to a tree. A graph for instance can't be represented in a useful way, and the programmer must resolve references by hand; the only way to obtain a meaningful representation of his data is to draw it himself on a sheet of paper, which is extremely time-consuming.