For users interested in tuning the performance of their applications, the STLinux distribution includes two profilers - gprof and OProfile.
The standard GNU profiler, gprof, has two key limitations for embedded Linux development:
Many embedded applications are never intended to exit!
OProfile addresses both of these issues. It is built into the kernel, and uses timer interrupts to profile the kernel and all user mode processes. The resulting profile can show the time spent in each function, process, thread, or binary. It can be started and stopped at any time and so does not require applications to complete.
Before OProfile can be used, it must be configured into the kernel:
Profiling support ---> Profiling support (EXPERIMENTAL) ---> OProfile system profiling (EXPERIMENTAL)
Of course, to get a symbol level profile of the kernel, it must be built with debug info:
Kernel hacking ---> Kernel debugging ---> Compile the kernel with debug info
The OProfile support can either be built as a kernel module, or statically linked into the kernel. The rest of this document assumes that it has been statically linked.
To profile the time spent in individual kernel functions, a copy of the kernel must be placed in the target filesystem. This can be done with scp.
host% scp vmlinux root@targetboard:/root2
OProfile is configured and driven using command-line utilities on the target system. The two most important of these are:
To start profiling the system, use the following commands. They could be put in a script for convenience.
target% rm -rf /var/lib/oprofile/ target% opcontrol --init target% opcontrol --setup --separate=all --vmlinux=/root/vmlinux target% opcontrol --start
The first command deletes any samples from previous runs. If the run was stopped unexpectedly (for example by resetting the board), then these samples will prevent any new ones being collected.
opcontrol --init creates the directory structure under /var and loads the OProfile kernel module where appropriate.
In the opcontrol --setup line, the kernel image is specified (this can be omitted if you can do not want to profile the kernel at symbol level), and we specify that samples should be separated out by process, thread, binary image and cpu. By default samples are only separated by binary image.
opcontrol --start begins collecting samples.
When you have run the application(s) to be profiled, two commands are available to write out the samples to /var:
The only difference is that after a dump, OProfile will continue to collect samples. opcontrol --stop stops profiling altogether.
Once the samples have been collected, they can be viewed with opreport:
target% opreport --merge=all CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt TIMER:0| samples| %| ------------------ 966 94.7988 vmlinux 23 2.2571 bash 18 1.7664 libc-2.3.3.so 6 0.5888 ld-2.3.3.so 2 0.1963 ls 2 0.1963 sshd 1 0.0981 gawk 1 0.0981 grep
This shows that over the profiled period, 94.8% of the time was spent in the kernel, 2.3% of the time in bash, 1.8% of the time in glibc...
The option --merge is used to undo the separation of samples by process and thread. To break the profile down by symbol, use the option -l:
target% opreport --merge=all -l CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt samples % app name symbol name 15660 50.3764 vmlinux cpu_idle 14954 48.1053 vmlinux default_idle 21 0.0676 vmlinux __raw_readsl 11 0.0354 libstdc++.so.6.0.3 anonymous symbol from section .plt 9 0.0290 ld-2.3.3.so do_lookup_x 8 0.0257 sshd rijndael_encrypt <snip>
This shows that just over 50% of all samples taken were in the kernel function cpu_idle.
The output can be restricted to a single binary image like this:
target% opreport --merge=all -l image:/`which sshd` CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt samples % app name symbol name 13 7.2626 sshd rijndael_encrypt 7 3.9106 sshd anonymous symbol from section .plt 5 2.7933 libcrypto.so.0.9.6 md5_block_host_order 5 2.7933 sshd rijndael_decrypt 4 2.2346 libc-2.3.3.so memcpy <snip>
This shows how long sshd spent in each function it executed, including shared library and kernel functions.
If the --merge option is removed, then the output is broken down by process and thread:
target% opreport -l image:/root/thread_test CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt Processes with a thread ID of 1112 Processes with a thread ID of 1139 Processes with a thread ID of 1154 samples % samples % samples % symbol name 1 100.000 0 0 0 0 _IO_new_file_overflow 0 0 1 100.000 0 0 __udivsi3_i4 0 0 0 0 1 100.000 clone <snip>
In this display, each thread has two columns - the number of samples, and the percentage this represents of total samples. In the example above, one only sample was taken in each thread - thread 1112 was sampled once in _IO_new_file_overflow, thread 1139 was sampled once in __udivsi3_i4 and so on.
Many more permutations of options are possible. As the last example shows, once the profile is broken down by thread the output can quickly become very complex and difficult to read! Care must be taken to specify the most appropriate report format for the information you are trying to extract.
One final option that may be of interest:
target% opgprof `which sshd`
This writes out the profile of a single application in gprof format. It can then be viewed with gprof, or more importantly with a GUI that recognises that format such as kprof.
Much more information on OProfile can be found on its project homepage, and in the manpages for the various command-line tools.
OProfile is sample-based. To get an totally accurate breakdown of the time spent in each process, thread, or interrupt handler, consider using KPTrace. KPTrace also provides detailed figures on the number of times interrupts fired, the number of context switches, and so on.
Both gprof and OProfile can be used from within STWorkbench, which provides a sophisticated graphical interface on the profile results. KPTrace also provides an excellent graphical trace viewer inside STWorkbench.