Performance Overhead

In this page we take a closer look at the performance overhead of TOD. Logging every event of a program's execution naturally incurs overhead. How much of it is acceptable for TOD to be useful is quite subjective, as it depends on the particular application being debugged. Debugging an algorithm that runs in 0.5s without TOD can probably stand a x10 overhead. But such a huge overhead would be unbearable for most applications. Moreover, in addition to the runtime overhead we must take into account the volume of trace data generated by TOD.

Roughly speaking, a 1ghz machine could generate about 0.1- 109 events per second. With an estimated average of 20 bytes per event, that's 2gb of data per second! That is orders of magnitude beyond the capacity of a current workstation. It is therefore necessary to let the programmer specify trusted code regions, that will not be instrumented and will not generate events. With such a mechanism we can greatly reduce the volume of trace data as well as the runtime overhead. For instance, in most cases it is not necessary to instrument JDK classes.

Benchmarks

So as to get an idea of the runtime overhead we ran a set of benchmarks. We created three very simple programs with different profiles:

  • Few method calls, mostly arithmetic operations and access to local variables
  • Lots of calls to instrumented methods
  • Lots of calls to uninstrumented methods

The benchmarking code can be found in the class tod.test.Benchmark. They were run on SVN revision 945.

Machine:Dell Latitude D810, 2ghz Pentium M with 1gb of 533mhz DDR RAM. OS: Kubuntu Linux 5.10, without X11, minimal services running. Java: Sun 1.5.0_04

Results:

Execution times in seconds
Without TOD With TOD ratio
1 Computations 2.91 2.98 1.02
2 Instrumented methods 1.45 15.59 10.77
3 Uninstrumented methods 3.71 3.93 1.06

As we immediately see, the worst case is situations in which only instrumented methods are called. Computations and calls to uninstrumented methods incure virtually no overhead. In real applications, which are usually a mix of these three situations, the runtime overhead will thus be between x1 (no overhead) and x10.