Using a KGDB enabled kernel, you can use GDB as a kernel debugger to cross debug the kernel from the host as if it were a normal user application.
For example, you can place breakpoints, look at the stack and display information about kernel threads.
This section shows some of the most useful GDB commands supported by the kernel debugger. It also contains some information about how to debug System faults.
Use the GDB break command to stop the kernel execution at a function or a source code line.
The break command takes as its argument a function name or a source code file name and a line number, separated by a colon (:), as shown:
(gdb) break drivers/char/tty_io.c:2887
If a specified breakpoint location cannot be found, it may be due to the
fact that the location is in a kernel module that is yet to be loaded. In this
case you can use pending breakpoints to still set a breakpoint in the module.
When the module containing the pending breakpoint location is loaded (insmod) the
pending breakpoint gets resolved: a regular breakpoint is created and the original
pending breakpoint is removed.
GDB provides some additional commands for controlling pending breakpoint
support:
The GDB step command steps one code statement. This command runs the kernel for one statement (including entering function calls) and returns control to GDB.
To skip stepping into function calls use the GDB next command.
GDB provides a listing of the kernel threads, so a particular thread can be specified for analysis and GDB commands such as bt or info regi can be used to get information in the context of the specified thread.
Useful GDB commands for debugging threads are:
The thread summary includes (in this order):
The current thread under analysis is indicated by an asterisk `*' on the left of the GDB thread number.
The stack trace can be displayed using the GDB command backtrace:
bt, info stack and info s are aliases for backtrace.
Each line in the backtrace shows the frame number and the function name. The backtrace also shows the source file name and line number, and the arguments to the function.
Useful GDB commands for examining stacks are:
Note: GDB commands such as finfo reg , examination of local variables, etc, can be used to get information in the context of the specified frame.
An Oops is what happens when the kernel gets an exception from kernel
code. In other words, Oopses are the equivalent of segmentation faults
in user space. For example, most Oops messages are the result of
dereferencing a NULL pointer.
When an Oops happens, it is important to be able to collect as much
information as possible to understand and solve the bug. Unfortunately,
it is not easy to do!
An Oops displays, on the system console, the processor status at the
time of the fault and the CPU registers; see the log below:
# insmod ktest1.ko # ./utest1 -t oops Unable to handle kernel paging request at virtual address deadbeef pc = c0160042 *pde = 00000000 Oops: 0000 [#1] Pid : 286, Comm: utest1 PC is at bad_pointer+0x2/0x20 [ktest1] PC : c0160042 SP : 875f1f5c SR : 40008001 TEA : deadbeef Not tainted R0 : c0160040 R1 : deadbeef R2 : 87bbe1c0 R3 : c0160280 R4 : 00000001 R5 : 87bbe1c0 R6 : 8004fb02 R7 : 00000000 R8 : 00000000 R9 : 87bbe1c0 R10 : 00000000 R11 : 00000000 R12 : fffffff7 R13 : 00000000 R14 : 7bfffd4c MACH: 00000108 MACL: 007057e3 GBR : 2969e440 PR : c0160068 Call trace: [<c01602c8>] kgdb_module_ioctl+0x48/0x80 [ktest1] [<8406b64c>] do_ioctl+0x4c/0x80 [<8406b6ee>] vfs_ioctl+0x6e/0x3a0 [<8406ba4e>] sys_ioctl+0x2e/0x80 [<84005168>] syscall_call+0xc/0xe [<8406ba20>] sys_ioctl+0x0/0x80 Segmentation fault
This message was generated by writing to a device owned by the ktest1 module. The implementation of this module is trivial and not discussed in this documentation. The utest1 application is is used simply for invoking the ioctl module method.
Note: for more details about how to debug a dynamic kernel module, please, click here.
kgdb_module_ioctl(...) ioctl method | ... |__ int bad_pointer(void) { int *p; p = (int *) 0xdeadbeef; return (*p); }
Note: the Oops messages are strictly depended on the architecture;
so, before starting your analysis, you should be familiar with the CPU.
For example, using the ST40 CPU CORE you should know some basic
information defined by the architecture and the ST40 ABI such as:
If you don't understand where the Oops occurs, you'd better add some more debugging printk() code or use the kernel debugger.
When an Oops occurs the KGDB stub is invoked and the kernel execution is stopped.
In this case, by invoking GDB, you will be able to see the same information
shown to the Oops fault handler, but the debugger will make it easier to:
Note: in order to make easier the debug session, please, compile the kernel, and the modules, with the debug information (the kernel configuration option is DEBUG_INFO).
On the host: connect KGDB (i.e. via network) and load your module symbols:
host% sh4-linux-gdb GNU gdb STMicroelectronics/Linux Base 6.3-9 ... breakpoint () at kernel/kgdb.c:1659 1659 atomic_set(&kgdb_setting_breakpoint, 0); (gdb) set breakpoint pending on (gdb) set solib-search-path /user/my_module_2.6/ (gdb) continue Continuing.
On the target: generate the Oops
# insmod ktest1.ko
# ./utest1 -t oops
Unable to handle kernel paging request at virtual address deadbeef
pc = c0175044
*pde = 00000000GDB console:
Program received signal SIGTRAP, Trace/breakpoint trap. [Switching to Thread 289] bad_pointer () at /user/my_module_2.6/ktest1.c:57 57 return (*p);
If a kernel panic situation occurs, instead of declaring a panic, the kernel will first give control to GDB so that the situation can be analyzed.
The Kernel debuggers can be used to perform post-mortem analysis of the kernel.
As described above, the kernel debugger allows you to perform a number of valuable operations, for example code disassembling or memory dumping to help the analysis of the cause of the panic. Of course, the developer must be able analyze and interpret the data displayed.
Although most bugs in kernel code end up as oops messages, sometimes they can completely hang the system.
For example, if a driver enters into a tight loop then the system may not be able to respond to any command at all.
Entering in the kernel debugger (by typing |Ctrl+C| on the GDB console) you will be able to interrupt the kernel and analyse which process has generated the loop (see the examples below).
On the target, our driver enters in a loop (i.e. in a function called generate_loop()).
kgdb_module_ioctl(...) ioctl method | ... |__ void generate_loop(void) { while(1){} }
So the system doesn't respond to any commands; then on the GDB console we type Ctrl+C.
Program received signal SIGTRAP, Trace/breakpoint trap. generate_loop () at wait.h:81 81 { (gdb) regs PC=c01750a4 SR=40008001 PR=c0175328 MACH=00000108 MACHL=007057e3 GBR=2969e440 VBR=deadbeef SSR=00000000 SPC=00000000 FPUL=00000000 FPSCR=00000000 R0-R7 00000002 c01750a0 874e2580 c0175300 00000002 874e2580 00000003 00000000 R8-R15 00000000 874e2580 00000000 00000000 fffffff7 00000000 877e5f4c 877e5f4c FP0-FP7 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 FP8-FP15 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 (gdb) bt #0 generate_loop () at wait.h:81 #1 0x84071bcc in do_ioctl (filp=0x874e2580, cmd=0, arg=694362400) at fs/ioctl.c:35
The log above shows the current stopped process information (registerstatus, stack etc.).
In this example after calling the ioctl method, our driver creates a
new thread (kgdb_thread). Then the generate_loop() function is invoked inside the kgdb_thread
thread.
Obviously the system hangs!
The following log shows how it is easy to understand where the loop has been generated.
Program received signal SIGTRAP, Trace/breakpoint trap. generate_loop () at wait.h:81 81 { (gdb) bt #0 generate_loop () at wait.h:81 #1 0x84003004 in kernel_thread_helper () at thread_info.h:67 (gdb) regs PC=c01750a4 SR=40008000 PR=c01751a8 MACH=00000000 MACHL=9e370001 GBR=00000000 VBR=deadbeef SSR=00000000 SPC=00000000 FPUL=00000000 FPSCR=00000000 R0-R7 00000000 c01750a0 40008000 00000019 84290dcc 84290dcc 877f2000 000000f0 R8-R15 c017606c 8424d000 840160a0 00000000 00000000 00000000 877f3fe8 877f3fe8 FP0-FP7 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 FP8-FP15 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 (gdb) disassemble 0xc01751a8 Dump of assembler code for function kgdb_thread: 0xc0175140 <kgdb_thread+0>: mov.l r8,@-r15 0xc0175142 <kgdb_thread+2>: mov.l r9,@-r15 0xc0175144 <kgdb_thread+4>: mov.l r10,@-r15 0xc0175146 <kgdb_thread+6>: mov.l r14,@-r15 0xc0175148 <kgdb_thread+8>: sts.l pr,@-r15 ...
In this case, looking at the stack backtrace alone is not sufficient to understand the cause. However, the program return (PR) register will help you to follow the flow of your code.
Obviously, the examples above are not fully exhaustive. There are a lot of other situation that the developer encounter and will need to resolve based on his experience and expertise.