Chapter 17. Tracing Tools

Contents

17.1. Tracing System Calls with strace
17.2. Tracing Library Calls with ltrace
17.3. Debugging and Profiling with Valgrind
17.4. For More Information

openSUSE comes with a number of tools that help you obtain useful information about your system. You can use the information for various purposes, for example, to debug and find problems in your program, to discover places causing performance drops, or to trace a running process to find out what system resources it uses. The tools are mostly part of the installation media, otherwise you can install them from the downloadable SUSE Software Development Kit.

[Note]Tracing and Impact on Performance

While a running process is being monitored for system or library calls, the performance of the process is heavily reduced. You are advised to use tracing tools only for the time you need to collect the data.

17.1. Tracing System Calls with strace

The strace command traces system calls of a process and signals received by the process. strace can either run a new command and trace its system calls, or you can attach strace to an already running command. Each line of the command's output contains the system call name, followed by its arguments in parenthesis and its return value.

To run a new command and start tracing its system calls, enter the command to be monitored as you normally do, and add strace at the beginning of the command line:

tux@mercury:~> strace ls
execve("/bin/ls", ["ls"], [/* 52 vars */]) = 0
brk(0)                                  = 0x618000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) \
	= 0x7f9848667000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) \
	= 0x7f9848666000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT \
(No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=200411, ...}) = 0
mmap(NULL, 200411, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9848635000
close(3)                                = 0
open("/lib64/librt.so.1", O_RDONLY)     = 3
[...]
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) \
= 0x7fd780f79000
write(1, "Desktop\nDocuments\nbin\ninst-sys\n", 31Desktop
Documents
bin
inst-sys
) = 31
close(1)                                = 0
munmap(0x7fd780f79000, 4096)            = 0
close(2)                                = 0
exit_group(0)                           = ?

To attach strace to an already running process, you need to specify the -p with the process ID (PID) of the process that you want to monitor:

tux@mercury:~> strace -p `pidof mysqld`
Process 2868 attached - interrupt to quit
select(15, [13 14], NULL, NULL, NULL)   = 1 (in [14])
fcntl(14, F_SETFL, O_RDWR|O_NONBLOCK)   = 0
accept(14, {sa_family=AF_FILE, NULL}, [2]) = 31
fcntl(14, F_SETFL, O_RDWR)              = 0
getsockname(31, {sa_family=AF_FILE, path="/var/run/mysql"}, [28]) = 0
fcntl(31, F_SETFL, O_RDONLY)            = 0
fcntl(31, F_GETFL)                      = 0x2 (flags O_RDWR)
fcntl(31, F_SETFL, O_RDWR|O_NONBLOCK)   = 0
[...]
setsockopt(31, SOL_IP, IP_TOS, [8], 4)  = -1 EOPNOTSUPP (Operation \
not supported)
clone(child_stack=0x7fd1864801f0, flags=CLONE_VM|CLONE_FS|CLONE_ \
FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_ \
PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fd1864809e0, \
tls=0x7fd186480910, child_tidptr=0x7fd1864809e0) = 21993
select(15, [13 14], NULL, NULL, NULL

The -e option understands several sub-options and arguments. For example, to trace all attempts to open or write to a particular file, use the following:

tux@mercury:~> strace -e trace=open,write ls ~
open("/etc/ld.so.cache", O_RDONLY)       = 3
open("/lib64/librt.so.1", O_RDONLY)      = 3
open("/lib64/libselinux.so.1", O_RDONLY) = 3
open("/lib64/libacl.so.1", O_RDONLY)     = 3
open("/lib64/libc.so.6", O_RDONLY)       = 3
open("/lib64/libpthread.so.0", O_RDONLY) = 3
[...]
open("/usr/lib/locale/cs_CZ.utf8/LC_CTYPE", O_RDONLY) = 3
open(".", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
write(1, "addressbook.db.bak\nbin\ncxoffice\n"..., 311) = 311

To trace only network related system calls, use -e trace=network:

tux@mercury:~> strace -e trace=network -p 26520
Process 26520 attached - interrupt to quit
socket(PF_NETLINK, SOCK_RAW, 0)         = 50
bind(50, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0
getsockname(50, {sa_family=AF_NETLINK, pid=26520, groups=00000000}, \
[12]) = 0
sendto(50, "\24\0\0\0\26\0\1\3~p\315K\0\0\0\0\0\0\0\0", 20, 0,
{sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 20
[...]

The -c calculates the time the kernel spent on each system call:

tux@mercury:~> strace -c find /etc -name xorg.conf
/etc/X11/xorg.conf
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 32.38    0.000181         181         1           execve
 22.00    0.000123           0       576           getdents64
 19.50    0.000109           0       917        31 open
 19.14    0.000107           0       888           close
  4.11    0.000023           2        10           mprotect
  0.00    0.000000           0         1           write
[...]
  0.00    0.000000           0         1           getrlimit
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0         3         1 futex
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         4           fadvise64
  0.00    0.000000           0         1           set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00    0.000559                  3633        33 total

To trace all child processes of a process, use -f:

tux@mercury:~> strace -f rcapache2 status
execve("/usr/sbin/rcapache2", ["rcapache2", "status"], [/* 81 vars */]) = 0
brk(0)                                  = 0x69e000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) \
= 0x7f3bb553b000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) \
= 0x7f3bb553a000
[...]
[pid  4823] rt_sigprocmask(SIG_SETMASK, [],  <unfinished ...>
[pid  4822] close(4 <unfinished ...>
[pid  4823] <... rt_sigprocmask resumed> NULL, 8) = 0
[pid  4822] <... close resumed> )       = 0
[...]
[pid  4825] mprotect(0x7fc42cbbd000, 16384, PROT_READ) = 0
[pid  4825] mprotect(0x60a000, 4096, PROT_READ) = 0
[pid  4825] mprotect(0x7fc42cde4000, 4096, PROT_READ) = 0
[pid  4825] munmap(0x7fc42cda2000, 261953) = 0
[...]
[pid  4830] munmap(0x7fb1fff10000, 261953) = 0
[pid  4830] rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
[pid  4830] open("/dev/tty", O_RDWR|O_NONBLOCK) = 3
[pid  4830] close(3)
[...]
read(255, "\n\n# Inform the caller not only v"..., 8192) = 73
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
exit_group(0)

If you need to analyze the output of strace and the output messages are too long to be inspected directly in the console window, use -o. In that case, unnecessary messages, such as information about attaching and detaching processes, are suppressed. You can also suppress these messages (normally printed on the standard output) with -q. To optionally prepend timestamps to each line with a system call, use -t:

tux@mercury:~> strace -t -o strace_sleep.txt sleep 1; more strace_sleep.txt
08:44:06 execve("/bin/sleep", ["sleep", "1"], [/* 81 vars */]) = 0
08:44:06 brk(0)                         = 0x606000
08:44:06 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, \
-1, 0) = 0x7f8e78cc5000
[...]
08:44:06 close(3)                       = 0
08:44:06 nanosleep({1, 0}, NULL)        = 0
08:44:07 close(1)                       = 0
08:44:07 close(2)                       = 0
08:44:07 exit_group(0)                  = ?

The behavior and output format of strace can be largely controlled. For more information, see the relevant manual page (man 1 strace).

17.2. Tracing Library Calls with ltrace

ltrace traces dynamic library calls of a process. It is used in a similar way to strace, and most of their parameters have a very similar or identical meaning. By default, ltrace uses /etc/ltrace.conf or ~/.ltrace.conf configuration files. You can, however, specify an alternative one with the -F config_file option.

In addition to library calls, ltrace with the -S option can trace system calls as well:

tux@mercury:~> ltrace -S -o ltrace_find.txt find /etc -name \
xorg.conf; more ltrace_find.txt
SYS_brk(NULL)                                              = 0x00628000
SYS_mmap(0, 4096, 3, 34, 0xffffffff)                       = 0x7f1327ea1000
SYS_mmap(0, 4096, 3, 34, 0xffffffff)                       = 0x7f1327ea0000
[...]
fnmatch("xorg.conf", "xorg.conf", 0)                       = 0
free(0x0062db80)                                           = <void>
__errno_location()                                         = 0x7f1327e5d698
__ctype_get_mb_cur_max(0x7fff25227af0, 8192, 0x62e020, -1, 0) = 6
__ctype_get_mb_cur_max(0x7fff25227af0, 18, 0x7f1327e5d6f0, 0x7fff25227af0,
0x62e031) = 6
__fprintf_chk(0x7f1327821780, 1, 0x420cf7, 0x7fff25227af0, 0x62e031
<unfinished ...>
SYS_fstat(1, 0x7fff25227230)                               = 0
SYS_mmap(0, 4096, 3, 34, 0xffffffff)                       = 0x7f1327e72000
SYS_write(1, "/etc/X11/xorg.conf\n", 19)                   = 19
[...]

You can change the type of traced events with the -e option. The following example prints library calls related to fnmatch and strlen functions:

tux@mercury:~> ltrace -e fnmatch,strlen find /etc -name xorg.conf
[...]
fnmatch("xorg.conf", "xorg.conf", 0)             = 0
strlen("Xresources")                             = 10
strlen("Xresources")                             = 10
strlen("Xresources")                             = 10
fnmatch("xorg.conf", "Xresources", 0)            = 1
strlen("xorg.conf.install")                      = 17
[...]

To display only the symbols included in a specific library, use -l /path/to/library:

tux@mercury:~> ltrace -l /lib64/librt.so.1 sleep 1
clock_gettime(1, 0x7fff4b5c34d0, 0, 0, 0)                  = 0
clock_gettime(1, 0x7fff4b5c34c0, 0xffffffffff600180, -1, 0) = 0
+++ exited (status 0) +++

You can make the output more readable by indenting each nested call by the specified number of space with the -n num_of_spaces.

17.3. Debugging and Profiling with Valgrind

Valgrind is a set of tools to debug and profile your programs so that they can run faster and with less errors. Valgrind can detect problems related to memory management and threading, or can also serve as a framework for building new debugging tools.

17.3.1. Installation

Valgrind is not shipped with standard openSUSE distribution. To install it on your system, you need to obtain SUSE Software Development Kit, and either install it as an Add-On product and run

zypper install valgrind

or browse through the SUSE Software Development Kit directory tree, locate the Valgrind package and install it with

rpm -i valgrind-version_architecture.rpm

17.3.2. Supported Architectures

Valgrind runs on the following architectures:

  • i386

  • x86_64 (AMD-64)

  • ppc

  • ppc64

17.3.3. General Information

The main advantage of Valgrind is that it works with existing compiled executables. You do not have to recompile or modify your programs to make use of it. Run Valgrind like this:

valgrind valgrind_options your-prog your-program-options

Valgrind consists of several tools, and each provides specific functionality. Information in this section is general and valid regardless of the used tool. The most important configuration option is --tool . This option tells Valgrind which tool to run. If you omit this option, memcheck is selected by default. For example, if you want to run find ~ -name .bashrc with Valgrind's memcheck tools, enter the following in the command line:

valgrind --tool=memcheck find ~ -name .bashrc

A list of standard Valgrind tools with a brief description follows:

memcheck

Detects memory errors. It helps you tune your programs to behave correctly.

cachegrind

Profiles cache prediction. It helps you tune your programs to run faster.

callgrind

Works in a similar way to cachegrind but also gathers additional cache-profiling information.

exp-drd

Detects thread errors. It helps you tune your multi-threaded programs to behave correctly.

helgrind

Another thread error detector. Similar to exp-drd but uses different techniques for problem analysis.

massif

A heap profiler. Heap is an area of memory used for dynamic memory allocation. This tool helps you tune your program to use less memory.

lackey

An example tool showing instrumentation basics.

17.3.4. Default Options

Valgrind can read options at start-up. There are three places which Valgrind checks:

  1. The file .valgrindrc in the home directory of the user who runs Valgrind.

  2. The environment variable $VALGRIND_OPTS

  3. The file .valgrindrc in the current directory where Valgrind is runned from.

These resources are parsed exactly in this order, while later given options take precedence over earlier processed options. Options specific to a particular Valgrind tool must be prefixed with the tool name and a colon. For example, if you want cachegrind to always write profile data to the /tmp/cachegrind_PID.log, add the following line to the .valgrindrc file in your home directory:

--cachegrind:cachegrind-out-file=/tmp/cachegrind_%p.log

17.3.5. How Valgrind Works

Valgrind takes control of your executable before it starts. It reads debugging information from the executable and related shared libraries. The executable's code is redirected to the selected Valgrind tool, and the tool adds its own code to handle its debugging. Then the code is handed back to the Valgrind core and the execution continues.

For example, memcheck adds its code, which checks every memory access. As a consequence, the program runs much slower than in the native execution environment.

Valgrind simulates every instruction of your program. Therefore, it not only checks the code of your program, but also all related libraries (including the C library), libraries used for graphical environment, and so on. If you try to detect errors with Valgrind, it also detects errors in associated libraries (like C, X11, or Gtk libraries). Because you probably do not need these errors, Valgrind can selectively, suppress these error messages to suppression files. The --gen-suppressions=yes tells Valgrind to report these suppressions which you can copy to a file.

Note that you should supply a real executable (machine code) as an Valgrind argument. Therefore, if your application is run, for example, from a shell or a Perl script you will by mistake get error reports related to /bin/sh (or /usr/bin/perl). In such case, you can use --trace-children=yes or, which is better, supply a real executable to avoid any processing confusion.

17.3.6. Messages

During its runtime, Valgrind reports messages with detailed errors and important events. The following example explains the messages:

tux@mercury:~> valgrind --tool=memcheck find ~ -name .bashrc
[...]
==6558== Conditional jump or move depends on uninitialised value(s)
==6558==    at 0x400AE79: _dl_relocate_object (in /lib64/ld-2.11.1.so)
==6558==    by 0x4003868: dl_main (in /lib64/ld-2.11.1.so)
[...]
==6558== Conditional jump or move depends on uninitialised value(s)
==6558==    at 0x400AE82: _dl_relocate_object (in /lib64/ld-2.11.1.so)
==6558==    by 0x4003868: dl_main (in /lib64/ld-2.11.1.so)
[...]
==6558== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
==6558== malloc/free: in use at exit: 2,228 bytes in 8 blocks.
==6558== malloc/free: 235 allocs, 227 frees, 489,675 bytes allocated.
==6558== For counts of detected errors, rerun with: -v
==6558== searching for pointers to 8 not-freed blocks.
==6558== checked 122,584 bytes.
==6558== 
==6558== LEAK SUMMARY:
==6558==    definitely lost: 0 bytes in 0 blocks.
==6558==      possibly lost: 0 bytes in 0 blocks.
==6558==    still reachable: 2,228 bytes in 8 blocks.
==6558==         suppressed: 0 bytes in 0 blocks.
==6558== Rerun with --leak-check=full to see details of leaked memory.

The ==6558== introduces Valgrind's messages and contains the process ID number (PID). You can easily distinguish Valgrind's messages from the output of the program itself, and decide which messages belong to a particular process.

To make Valgrind's messages more detailed, use -v or even -v -v.

Basically, you can make Valgrind send its messages to three different places:

  1. By default, Valgrind sends its messages to the file descriptor 2, which is the standard error output. You can tell Valgrind to send its messages to any other file descriptor with the --log-fd=file_descriptor_number option.

  2. The second and probably more useful way is to send Valgrind's messages to a file with --log-file=filename. This option accepts several variables, for example, %p gets replaced with the PID of the currently profiled process. This way you can send messages to different files based on their PID. %q{env_var} is replaced with the value of the related env_var environment variable.

    The following example checks for possible memory errors during the Apache Web server restart, while following children processes and writing detailed Valgrind's messages to separate files distinguished by the current process PID:

    tux@mercury:~> valgrind -v --tool=memcheck --trace-children=yes \
    --log-file=valgrind_pid_%p.log rcapache2 restart

    This process created 52 log files in the testing system, and took 75 seconds instead of the usual 7 seconds needed to run rcapache2 restart without Valgrind, which is approximately 10 times more.

    tux@mercury:~> ls -1 valgrind_pid_*log
    valgrind_pid_11780.log
    valgrind_pid_11782.log
    valgrind_pid_11783.log
    [...]
    valgrind_pid_11860.log
    valgrind_pid_11862.log
    valgrind_pid_11863.log
  3. You may also prefer to send the Valgrind's messages over the network. You need to specify the aa.bb.cc.dd IP address and port_num port number of the network socket with the --log-socket=aa.bb.cc.dd:port_num option. If you omit the port number, 1500 will be used.

    It is useless to send Valgrind's messages to a network socket if no application is capable of receiving them on the remote machine. That is why valgrind-listener, a simple listener, is shipped together with Valgrind. It accepts connections on the specified port and copies everything it receives to the standard output.

17.3.7. Error Messages

Valgrind remembers all error messages, and if it detects a new error, the error is compared against old error messages. This way Valgrind checks for duplicate error messages. In case of a duplicate error, it is recorded but no message is shown. This mechanism prevents you from being overwhelmed by millions of duplicate errors.

The -v option will add a summary of all reports (sorted by their total count) to the end of the Valgrind's execution output. Moreover, Valgrind stops collecting errors if it detects either 1000 different errors, or 10 000 000 errors in total. If you want to suppress this limit and wish to see all error messages, use --error-limit=no.

Some errors usually cause other ones. Therefore, fix errors in the same order as they appear and re-check the program continuously.

17.4. For More Information

  • For a complete list of options related to the described tracing tools, see the corresponding man page (man 1 strace, man 1 ltrace, and man 1 valgrind).

  • To describe advanced usage of Valgrind is beyond the scope of this document. It is very well documented, see Valgrind User Manual. These pages are indispensable if you need more advanced information on Valgrind or the usage and purpose of its standard tools.