Using a KGDB enabled kernel, you can use GDB as a kernel debugger to cross debug the kernel from the host as if it were a normal user application.
For example, you can place breakpoints, look at the stack and display information about kernel threads.

This section shows some of the most useful GDB commands supported by the kernel debugger. It also contains some information about how to debug System faults.

Breakpoints

Use the GDB break command to stop the kernel execution at a function or a source code line.

The break command takes as its argument a function name or a source code file name and a line number, separated by a colon (:), as shown:

(gdb) break drivers/char/tty_io.c:2887
Pending Breakpoints

If a specified breakpoint location cannot be found, it may be due to the fact that the location is in a kernel module that is yet to be loaded. In this case you can use pending breakpoints to still set a breakpoint in the module.
When the module containing the pending breakpoint location is loaded (insmod) the pending breakpoint gets resolved: a regular breakpoint is created and the original pending breakpoint is removed.
GDB provides some additional commands for controlling pending breakpoint support:

Stepping through code

The GDB step command steps one code statement. This command runs the kernel for one statement (including entering function calls) and returns control to GDB.

To skip stepping into function calls use the GDB next command.

Thread analysis

GDB provides a listing of the kernel threads, so a particular thread can be specified for analysis and GDB commands such as bt or info regi can be used to get information in the context of the specified thread.

Useful GDB commands for debugging threads are:

  • info threads, which displays a summary of all threads in the kernel, and

  • thread , which switches the current thread under analysis to the specified thread.

The thread summary includes (in this order):

  • the thread number assigned by GDB,

  • the target system thread identifier,

  • the current stack frame summary for that thread.

The current thread under analysis is indicated by an asterisk `*' on the left of the GDB thread number.

Examining the stack

The stack trace can be displayed using the GDB command backtrace:

  • backtrace prints a backtrace of the entire stack of the current thread,

  • backtrace <n> prints only the innermost <n> frames,

  • backtrace -<n> prints only the outermost <n> frames,

bt, info stack and info s are aliases for backtrace.

Each line in the backtrace shows the frame number and the function name. The backtrace also shows the source file name and line number, and the arguments to the function.

Useful GDB commands for examining stacks are:

  • frame <frameno>, which switches the current frame under analysis to the specified frame.

Note: GDB commands such as finfo reg , examination of local variables, etc, can be used to get information in the context of the specified frame.


Debugging System Faults

Oops

An Oops is what happens when the kernel gets an exception from kernel code. In other words, Oopses are the equivalent of segmentation faults in user space. For example, most Oops messages are the result of dereferencing a NULL pointer.
When an Oops happens, it is important to be able to collect as much information as possible to understand and solve the bug. Unfortunately, it is not easy to do!
An Oops displays, on the system console, the processor status at the time of the fault and the CPU registers; see the log below:

# insmod ktest1.ko
# ./utest1 -t oops
Unable to handle kernel paging request at virtual address deadbeef
pc = c0160042
*pde = 00000000
Oops: 0000 [#1]
 
Pid : 286, Comm:               utest1
PC is at bad_pointer+0x2/0x20 [ktest1]
PC  : c0160042 SP  : 875f1f5c SR  : 40008001 TEA : deadbeef    Not tainted
R0  : c0160040 R1  : deadbeef R2  : 87bbe1c0 R3  : c0160280
R4  : 00000001 R5  : 87bbe1c0 R6  : 8004fb02 R7  : 00000000
R8  : 00000000 R9  : 87bbe1c0 R10 : 00000000 R11 : 00000000
R12 : fffffff7 R13 : 00000000 R14 : 7bfffd4c
MACH: 00000108 MACL: 007057e3 GBR : 2969e440 PR  : c0160068
 
Call trace:
[<c01602c8>] kgdb_module_ioctl+0x48/0x80 [ktest1]
[<8406b64c>] do_ioctl+0x4c/0x80
[<8406b6ee>] vfs_ioctl+0x6e/0x3a0
[<8406ba4e>] sys_ioctl+0x2e/0x80
[<84005168>] syscall_call+0xc/0xe
[<8406ba20>] sys_ioctl+0x0/0x80
Segmentation fault

This message was generated by writing to a device owned by the ktest1 module. The implementation of this module is trivial and not discussed in this documentation. The utest1 application is is used simply for invoking the ioctl module method.

Note: for more details about how to debug a dynamic kernel module, please, click here.

kgdb_module_ioctl(...)     ioctl method
   |
  ...
   |__ int bad_pointer(void)
       {
                int *p;
 
                p = (int *) 0xdeadbeef;
                return (*p);
       }

Note: the Oops messages are strictly depended on the architecture; so, before starting your analysis, you should be familiar with the CPU.
For example, using the ST40 CPU CORE you should know some basic information defined by the architecture and the ST40 ABI such as:

  • PC (Program Counter) indicates the executing instruction address.
    For example it could be:
    • a logical kernel address (typically in the P1 address memory segment); example: 0x8400d9c0
    • a virtual module kernel address (see the example shown above).

    • an invalid value; example: 0x00000000.
  • SPC (Saved program counter): the address of an instruction at which an interrupt or exception occurs is saved in this register.
  • SR is the Status Register.
  • After an MMU exception or address error exception occurs, the virtual address at which the exception occurred is set in TEA register by hardware. In the log shown above this should be clear!
  • The R15 register contains the stack pointer.
  • The R14 register contains the frame pointer.

If you don't understand where the Oops occurs, you'd better add some more debugging printk() code or use the kernel debugger.

When an Oops occurs the KGDB stub is invoked and the kernel execution is stopped.
In this case, by invoking GDB, you will be able to see the same information shown to the Oops fault handler, but the debugger will make it easier to:

  • understanding where the Oops was generated,
  • get information from memory,
  • disassembling the whole responsible function.

Note: in order to make easier the debug session, please, compile the kernel, and the modules, with the debug information (the kernel configuration option is DEBUG_INFO).

    On the host: connect KGDB (i.e. via network) and load your module symbols:

    host% sh4-linux-gdb
    GNU gdb STMicroelectronics/Linux Base 6.3-9
    ...
    breakpoint () at kernel/kgdb.c:1659
    1659            atomic_set(&kgdb_setting_breakpoint, 0);
     
    (gdb) set breakpoint pending on
    (gdb) set solib-search-path /user/my_module_2.6/
    (gdb) continue
    Continuing.

    On the target: generate the Oops

    # insmod ktest1.ko
    # ./utest1 -t oops
    Unable to handle kernel paging request at virtual address deadbeef
    pc = c0175044
    *pde = 00000000

    GDB console:

    Program received signal SIGTRAP, Trace/breakpoint trap.
    [Switching to Thread 289]
    bad_pointer ()
        at /user/my_module_2.6/ktest1.c:57
    57         return (*p);
Panic

If a kernel panic situation occurs, instead of declaring a panic, the kernel will first give control to GDB so that the situation can be analyzed.
The Kernel debuggers can be used to perform post-mortem analysis of the kernel.
As described above, the kernel debugger allows you to perform a number of valuable operations, for example code disassembling or memory dumping to help the analysis of the cause of the panic. Of course, the developer must be able analyze and interpret the data displayed.

System Hangs

Although most bugs in kernel code end up as oops messages, sometimes they can completely hang the system.
For example, if a driver enters into a tight loop then the system may not be able to respond to any command at all.
Entering in the kernel debugger (by typing |Ctrl+C| on the GDB console) you will be able to interrupt the kernel and analyse which process has generated the loop (see the examples below).

Example 1

On the target, our driver enters in a loop (i.e. in a function called generate_loop()).

kgdb_module_ioctl(...)      ioctl method
   |
  ...
   |__ void generate_loop(void)
           {
                while(1){}
           }

So the system doesn't respond to any commands; then on the GDB console we type Ctrl+C.

Program received signal SIGTRAP, Trace/breakpoint trap.
generate_loop () at wait.h:81
81      {
(gdb) regs
PC=c01750a4 SR=40008001 PR=c0175328 MACH=00000108 MACHL=007057e3
GBR=2969e440 VBR=deadbeef SSR=00000000 SPC=00000000 FPUL=00000000 FPSCR=00000000
R0-R7  00000002 c01750a0 874e2580 c0175300 00000002 874e2580 00000003 00000000
R8-R15 00000000 874e2580 00000000 00000000 fffffff7 00000000 877e5f4c 877e5f4c
FP0-FP7  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
FP8-FP15 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
(gdb) bt
#0  generate_loop () at wait.h:81
#1  0x84071bcc in do_ioctl (filp=0x874e2580, cmd=0, arg=694362400)
    at fs/ioctl.c:35

The log above shows the current stopped process information (registerstatus, stack etc.).

Example 2

In this example after calling the ioctl method, our driver creates a new thread (kgdb_thread). Then the generate_loop() function is invoked inside the kgdb_thread thread.
Obviously the system hangs!
The following log shows how it is easy to understand where the loop has been generated.

Program received signal SIGTRAP, Trace/breakpoint trap.
generate_loop () at wait.h:81
81      {
(gdb) bt
#0  generate_loop () at wait.h:81
#1  0x84003004 in kernel_thread_helper () at thread_info.h:67
(gdb) regs
PC=c01750a4 SR=40008000 PR=c01751a8 MACH=00000000 MACHL=9e370001
GBR=00000000 VBR=deadbeef SSR=00000000 SPC=00000000 FPUL=00000000 FPSCR=00000000
R0-R7  00000000 c01750a0 40008000 00000019 84290dcc 84290dcc 877f2000 000000f0
R8-R15 c017606c 8424d000 840160a0 00000000 00000000 00000000 877f3fe8 877f3fe8
FP0-FP7  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
FP8-FP15 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
(gdb) disassemble 0xc01751a8
Dump of assembler code for function kgdb_thread:
0xc0175140 <kgdb_thread+0>:     mov.l   r8,@-r15
0xc0175142 <kgdb_thread+2>:     mov.l   r9,@-r15
0xc0175144 <kgdb_thread+4>:     mov.l   r10,@-r15
0xc0175146 <kgdb_thread+6>:     mov.l   r14,@-r15
0xc0175148 <kgdb_thread+8>:     sts.l   pr,@-r15
...

In this case, looking at the stack backtrace alone is not sufficient to understand the cause. However, the program return (PR) register will help you to follow the flow of your code.

Obviously, the examples above are not fully exhaustive. There are a lot of other situation that the developer encounter and will need to resolve based on his experience and expertise.