0. Tracing. Linux provides strace(1) and ltrace(1) to trace system calls and library function calls respectively. Strace is based on the ptrace(2) family of system calls. Read the manual page ("man 2 ptrace") for their description. See http://lxr.linux.no/#linux+v2.6.11/kernel/ptrace.c to get the idea of ptrace building blocks, esp. http://lxr.linux.no/linux+*/kernel/ptrace.c#L120 (ptrace_attach) and ptrace_readdata, ptrace_writedata . Suggestion: strace several standard Linux commands such as ps and ls and interpret the results. Same for ltrace. ltrace(1) uses a non-kernel -- but no less fundamental -- mechanism, which is the basis of *dynamic linking*. We will discuss it in detail at a later time. Linux | OpenSolaris ----------+-------------- strace | truss ltrace | apptrace 1. Tracing ps ps on modern operating systems does little beyond reading /proc and interpreting and pretty-printing its contents. The kernel exposes its process control info in /proc's pseudo-files, and it is the kernel's functions that walk the process control blocks in response to your ps's "open" and "read" system calls. The design which re-dispatches these general file-related system calls to the appropriate worker functions is called VFS. Linux uses a similar design (except Linux's inodes are the same thing as Solaris' vnodes). See Figure 1.5 on p. 31 and table 1.1 on p.32 for an overview of VFS. 2. A /proc-based "ps" is a glorified "ls" + "cat" See what system calls ps makes with "truss ps". The getdents64 function is what lists the contents of a directory (this is what ls uses, too). The directory read by ps is, of course, /proc. See "man getdents". See procdents.d for the script that exposes kernel functions called in response to a ps commands' getdents call. Look at their code in the Solaris source browser. We will read the revealed functions' code to see how they walk the process list next time. ---- However, OpenSolaris provides a much more kernel-level tracing framework: DTrace. ---- 3. Exploring DTrace. Unlike "truss " which traces only system calls by the process created by , DTrace probes fire for events caused by all processes on the system, unless limited by predicates. Hint: as a rule, there is some event that happens a lot more often than others, and needs to be filtered out before the output becomes readable. DTrace probes are ultimately function adresses within the kernel with argument Providers: syscall -- entry & return of all syscalls (argument knowledge) proc -- process creation and lifecycle; signal-related events fbt -- all kernel functions' boundaries sdt -- statically defined tracing: programmer-placed probes sched -- scheduler events that happen to threads io -- I/O subsystem pid -- user-level functions in processes vminfo -- VM events (based on kstat) sysinfo -- sys kstat kernel statistics profile -- profiler, runs periodic "tick" actions (e.g., profile:::tick-5s) D has its own built-in string type. Convert to it with stringof( char* ) for char* kernel strings. Argument strings located in userspace (such as syscall arguments), must be copied into kernel space before they can used, since all probe actions are executed within kernel space. Use "copyinstr" (see examples). #pragma D option quiet suppresses DTrace's own default print actions #pragma D option flowindent supplies indentation that follows entry and exit from functions 4. Examples. Lots of examples found here: http://www.brendangregg.com/DTrace/dtrace_oneliners.txt Suggestion: work through the examples in http://www.brendangregg.com/DTrace/dtrace_oneliners.txt , using the DTrace Guide (http://docs.sun.com/app/docs/doc/817-6223) http://www.sun.com/bigadmin/content/dtrace/ provides several DTrace tutorials, DTrace developer blogs and use cases. NOTE: DTrace tries to provide its action blocks with variables and structs that are most convenient to work with. Which variables will be made accessible in the action block depends on the probes that match the provider:module:probefunc:probename expression. But: - if a variable is not defined for some matching probe, the block won't compile; - always check the DTrace Guide for the tables of variables available for specific providers and probes: this can save a lot of time. Observe the curpsinfo variable pointing to a special "struct psinfo_t" filled with info about the "current" process (i.e., the process that caused the probe to fire), as described in "Table 25–1 proc Probes" at http://docs.sun.com/app/docs/doc/817-6223/chp-proc?a=view Observe that this struct's pr_psargs member contains the string of arguments to ls after the Bash shell expanded them. --- 5. Using the Modular Debugger (MDB) to examine kernel state "mdb -k" launches the debugger and "attaches" it to the running kernel (just as "mbd -p " would attach it to a running use process). The debugger has a somewhat different command syntax from GDB. Its ::help command is a useful entry point. Here is a (larger) tutorial: http://learningsolaris.com/docs/chpt_mdb_os.pdf Tip: You can pipe debugger commands' output through grep when there is too much of it, rather than dealing with the internal pager. E.g.: ::dcmds !grep module to catch all commands with names or descriptions that contain "module". To see the address (in hex) of a symbol (function, variable, or any other thing that the debugger knows the address of): =X E.g.: getpid=X -- the address of the getpid function as a 32bit hex number To see the contents (in hex) of memory at that address, use / rather than = getpid/X -- the opcodes at the start of the getpid function as one 32bit hex number (little endian) getpid/4B -- the same, as 4 separate bytes, in order getpid/4i -- the opcodes disassembled into instructions (stops at first 4) More about formats: ::formats (e.g., "::formats !grep hex") getpid::dis -- disassembles the whole function ==== Useful commands to look up: ::objects ::ps ::print -t "struct proc" -- print the definition of a data type e1ddb380::print proc_t -- print the contents of memory at e1ddb380, interpreting it as a proc_t struct type (I got the above address from the ::ps "walker" command that knows how to find and walk the process control block) e1ddb380::print proc_t p_ppid -- print only the selected part of the data structure at that address (remember to use !grep liberally if you don't like scrolling) More info on this is in Chapter 2.4 ("Process Structures"). This is what ps and other process reporting utilities extract; we are going to see how. 6. DTrace internals Mdb can reveal some details of DTrace implementation (see dtrace-internals-x86.pdf for more). Install a probe into the running kernel: # dtrace -n fbt::getpid:entry Without quitting the trace, examine it in memory: # mdb -k > getpid::dis Quit the trace, the repeat the same "mdb -k" examination of the same function. You will notice the "int 3" (opcode 0xCC) at the preamble. DTrace installs its own INT 3 handler in the IDT, just like a typical debugger, except that debuggers work in "ring 3", whereas DTrace's fbt provider works fully within the kernel (except for passing the output to the user-level dtrace(1) utility to print). --TBC--