SourceXR

C/C++ Cross-Reference Tool

Deadlock Identification with GDB Python Interface

Identifying a deadlock in a program might be tedious especially if there are many threads involved. To quickly help identify deadlocked threads, we wrote a small script using the Python interface of gdb.

The script was tested on linux 64-bit (amd64) with gdb version 7.7. It is available at the end of the article.

Deadlock Identification

The deadlocked process is either running in a gdb session or through a core file generated for example with an abort signal.

If a program (or part of it) is deadlocked, several threads are waiting for a mutex to be unlocked.

For a usual mutex, the thread will be waiting for the __lll_lock_wait() function call to return.

A mutex is represented by a data structure named pthread_mutex_t which contains (in /usr/include/bits/pthreadtypes.h) a struct of type __pthread_mutex_s named __data. This structure contains in turn (among other fields) a member called __owner. The field is set to the PID of the thread which calls pthread_mutex_lock (see the glibc source files).

Therefore, if the calling thread is waiting for the mutex to be unlocked, the locking thread is recorded in the field __owner of the mutex.

Likewise, for read/write mutexes the respective functions are pthread_rwlock_wrlock and pthread_rwlock_rdlock.

These mutexes are represented by a data structure named pthread_rwlock_t. It contains a field named __writer which records the PID of the writer thread. Locking readers are not recorded.

If there is a deadlock, it means that one or more threads are waiting for one another. Using the member of the mutexes above, we can build a graph of which thread is waiting for which one. A deadlock is a cycle in this graph.

We can write a script in python that will automate the cycle detection automatically.

Implementation

The steps are:

  • go through all threads
  • check if the thread is waiting for a mutex: the first stack frame is one of the above functions
  • build the graph of dependencies
  • look for a cycle in this graph

Preliminary consideration: we rely on the gcc calling convention which puts the first argument of a function call in $rdi.

Beware! Since we are using undocumented data types and internal implementations, this script may need to be adapted to your specific linux environment.

Threads

The gdb python interface names the program(s) under debugger as inferiors. Therefore to access the threads of the process we first get the inferiors of gdb followed by a call to get its threads:

processes = gdb.inferiors()
if len(processes) == 0:
    return
process = processes[0]
threads = process.threads()

Stack Frame Check

Once we got the threads, we go through all the stack frames and look for thread blocked in one of the mutex functions:

for t in threads:
    t.switch()
    frame = gdb.selected_frame()

    # fun is the name of the current function
    fun = frame.name()

pthread_mutex_lock

The locking function is named __lll_lock_wait and the mutex is passed as $rdi. Its type is pthread_mutex_t. Therefore to get the locking thread PID we perform the following steps:

if fun.find('__lll_lock_wait') == 0:
    # get rdi value
    ptr = gdb.parse_and_eval("$rdi")

    # rdi is a pointer to pthread_mutex_t
    type = gdb.lookup_type("pthread_mutex_t")
    type = type.pointer()

    # get mutex pointer
    mutex = ptr.cast(type)

    # get locking thread PID
    owner = mutex.dereference()['__data']['__owner']

pthread_rwlock_wrlock

Below is an excerpt of the assembly code for pthread_rwlock_wrlock when a thread is waiting for the mutex:

0x0000000000404a82 <+50>:    add    $0xc,%rdi
0x0000000000404a86 <+54>:    mov    $0xca,%eax
0x0000000000404a8b <+59>:    syscall

Therefore we need to adjust the value of $rdi before accessing attributes of the mutex.

The mutex type is pthread_rwlock_t. To get the locking thread PID we perform the following steps:

elif fun.find('pthread_rwlock_wrlock') == 0:
    # get rdi value
    ptr = gdb.parse_and_eval("$rdi - 12")

    # rdi is a pointer to pthread_rwlock_t
    type = gdb.lookup_type("pthread_rwlock_t")
    type = type.pointer()

    # get mutex pointer
    mutex = ptr.cast(type)

    # get locking thread PID
    owner = mutex.dereference()['__data']['__writer']

pthread_rwlock_rdlock

Below is an excerpt of pthread_rwlock_rdlock when a thread is waiting for the mutex:

0x00007ffff7bca7e8 <+56>:    add    $0x8,%rdi
0x00007ffff7bca7ec <+60>:    mov    $0xca,%eax
0x00007ffff7bca7f1 <+65>:    syscall

Here as well we need to adjust the value of $rdi before accessing attributes of the mutex.

The mutex type is pthread_rwlock_t. To get the locking thread PID we perform the following steps:

elif fun.find('pthread_rwlock_rdlock') == 0:
    # get rdi value
    ptr = gdb.parse_and_eval("$rdi - 8")

    # rdi is a pointer to pthread_rwlock_t
    type = gdb.lookup_type("pthread_rwlock_t")
    type = type.pointer()

    # get mutex pointer
    mutex = ptr.cast(type)

    # get locking thread PID
    owner = mutex.dereference()['__data']['__writer']

Cycle Detection

Once we have these basic steps we build a map with the thread ids indicating which thread are waiting for another thread.

Then we go through this map and try to detect cycles. If we found one, there is a deadlock.

Wrapping everything up

We need to add a mapping of gdb thread identifier to system process id:

# build mapping tid -> gdb tid
for t in threads:
    tids[t.ptid[1]] = t.num

We record the 'wait for' relation in a map named locked_by:

if owner in tids:
    locked_by[t.num] = tids[owner]
else:
    print "Owner thread not found " + str(owner)

To launch the script, call at the gdb prompt the following line:

python execfile ('/path/to/script.py')

If there is a deadlock, it will be printed along with the involved threads.

The complete script is available here.

Comments !