You are here

Introduction to Profiling

For users interested in tuning the performance of their applications, the STLinux distribution includes two profilers - gprof and OProfile.

The standard GNU profiler, gprof, has two key limitations for embedded Linux development:

  • It will only profile a single user mode application.
  • It will only profile an entire run of that application, and requires it to exit.

Many embedded applications are never intended to exit!

OProfile addresses both of these issues. It is built into the kernel, and uses timer interrupts to profile the kernel and all user mode processes. The resulting profile can show the time spent in each function, process, thread, or binary. It can be started and stopped at any time and so does not require applications to complete.

Using OProfile

Before OProfile can be used, it must be configured into the kernel:

Profiling support  --->
  Profiling support (EXPERIMENTAL) --->
    OProfile system profiling (EXPERIMENTAL)

Of course, to get a symbol level profile of the kernel, it must be built with debug info:

Kernel hacking  --->
  Kernel debugging --->
    Compile the kernel with debug info

The OProfile support can either be built as a kernel module, or statically linked into the kernel. The rest of this document assumes that it has been statically linked.

To profile the time spent in individual kernel functions, a copy of the kernel must be placed in the target filesystem. This can be done with scp.

host% scp vmlinux root@targetboard:/root2

OProfile is configured and driven using command-line utilities on the target system. The two most important of these are:

  • opcontrol - to configure, start and stop profiling.
  • opreport - to output the current profile in a variety of forms.

To start profiling the system, use the following commands. They could be put in a script for convenience.

target% rm -rf /var/lib/oprofile/
 target% opcontrol --init
 target% opcontrol --setup --separate=all --vmlinux=/root/vmlinux
 target% opcontrol --start

The first command deletes any samples from previous runs. If the run was stopped unexpectedly (for example by resetting the board), then these samples will prevent any new ones being collected.

opcontrol --init creates the directory structure under /var and loads the OProfile kernel module where appropriate.

In the opcontrol --setup line, the kernel image is specified (this can be omitted if you can do not want to profile the kernel at symbol level), and we specify that samples should be separated out by process, thread, binary image and cpu. By default samples are only separated by binary image.

opcontrol --start begins collecting samples.

When you have run the application(s) to be profiled, two commands are available to write out the samples to /var:

opcontrol --dump

opcontrol --stop

The only difference is that after a dump, OProfile will continue to collect samples. opcontrol --stop stops profiling altogether.

Displaying the results

Once the samples have been collected, they can be viewed with opreport:

target% opreport --merge=all
 CPU: CPU with timer interrupt, speed 0 MHz (estimated)
 Profiling through timer interrupt
           TIMER:0|
   samples|      %|
 ------------------
       966 94.7988 vmlinux
        23  2.2571 bash
        18  1.7664 libc-2.3.3.so
         6  0.5888 ld-2.3.3.so
         2  0.1963 ls
         2  0.1963 sshd
         1  0.0981 gawk
         1  0.0981 grep

This shows that over the profiled period, 94.8% of the time was spent in the kernel, 2.3% of the time in bash, 1.8% of the time in glibc...

The option --merge is used to undo the separation of samples by process and thread. To break the profile down by symbol, use the option -l:

target% opreport --merge=all -l
 CPU: CPU with timer interrupt, speed 0 MHz (estimated)
 Profiling through timer interrupt
 samples  %        app name                 symbol name
 15660    50.3764  vmlinux                  cpu_idle
 14954    48.1053  vmlinux                  default_idle
 21        0.0676  vmlinux                  __raw_readsl
 11        0.0354  libstdc++.so.6.0.3       anonymous symbol from section .plt
 9         0.0290  ld-2.3.3.so              do_lookup_x
 8         0.0257  sshd                     rijndael_encrypt
<snip>

This shows that just over 50% of all samples taken were in the kernel function cpu_idle.

The output can be restricted to a single binary image like this:

target% opreport --merge=all -l image:/`which sshd`
 CPU: CPU with timer interrupt, speed 0 MHz (estimated)
 Profiling through timer interrupt
 samples  %        app name                 symbol name
 13        7.2626  sshd                     rijndael_encrypt
 7         3.9106  sshd                     anonymous symbol from section .plt
 5         2.7933  libcrypto.so.0.9.6       md5_block_host_order
 5         2.7933  sshd                     rijndael_decrypt
 4         2.2346  libc-2.3.3.so            memcpy
<snip>

This shows how long sshd spent in each function it executed, including shared library and kernel functions.

If the --merge option is removed, then the output is broken down by process and thread:

target% opreport -l image:/root/thread_test
 CPU: CPU with timer interrupt, speed 0 MHz (estimated)
 Profiling through timer interrupt
 Processes with a thread ID of 1112
 Processes with a thread ID of 1139
 Processes with a thread ID of 1154
 samples % samples % samples % symbol name
 1 100.000 0 0 0 0 _IO_new_file_overflow
 0 0 1 100.000 0 0 __udivsi3_i4
 0 0 0 0 1 100.000 clone
 <snip>

In this display, each thread has two columns - the number of samples, and the percentage this represents of total samples. In the example above, one only sample was taken in each thread - thread 1112 was sampled once in _IO_new_file_overflow, thread 1139 was sampled once in __udivsi3_i4 and so on.

Many more permutations of options are possible. As the last example shows, once the profile is broken down by thread the output can quickly become very complex and difficult to read! Care must be taken to specify the most appropriate report format for the information you are trying to extract.

One final option that may be of interest:

target% opgprof `which sshd`

This writes out the profile of a single application in gprof format. It can then be viewed with gprof, or more importantly with a GUI that recognises that format such as kprof.

Much more information on OProfile can be found on its project homepage, and in the manpages for the various command-line tools.

Other useful tools

OProfile is sample-based. To get an totally accurate breakdown of the time spent in each process, thread, or interrupt handler, consider using KPTrace. KPTrace also provides detailed figures on the number of times interrupts fired, the number of context switches, and so on.

Both gprof and OProfile can be used from within STWorkbench, which provides a sophisticated graphical interface on the profile results. KPTrace also provides an excellent graphical trace viewer inside STWorkbench.

Architecture: