0. Why study linking and loading in detail in an OS course?

Linking and loading play an unglamorous but central role in major OS
functions.  

As overall OS (kernel + libraries + service processes) complexity keeps
growing (which is often unavoidable, for good reasons), it is
imperative that the running image of a process gets assembled from many
parts. Some of these parts are functionally different and are created
by different tools and parts of tools in the toolchain within a single
executable or shared object file, others are even written and maintained 
by different organizations.

The trick, then, is to engineer the *binary format*, the *Application
Binary Interface* (ABI), and the corresponding parts of the both the
development toolchain and the OS ("binutils" in GNU-speak, the binary
loader that backs the exec*() family of system calls, and the kernel's
own module loader) in such a way of integrating all these parts
remains both _tractable_ and _extensible_. 

I suggest calling engineering patterns that go into well-proven
linking-and-loading designs such as the ELF format and GNU/Linux ABI
*"integration patterns"*, by analogy with the programming idioms known
as "design patterns" that help keep code tractable and maintainable
are (e.g., http://en.wikipedia.org/wiki/Design_pattern_(computer_science) ).

CAVEAT: This view is not very common. There is only one affordable, 
	dedicated book that covers the subject: 
	John Levine's "Linkers and Loaders". 

	Luckily, it's almost complete draft copy is freely available online:
	http://www.iecc.com/linker/
 
For a collection of links on ELF hacking see:

                http://www.hackercurriculum.org/elf
	
We will work through the ELF symbol table and dynamic linking 
structure next week. 

1. DTrace examples

# dtrace -n 'proc:::exec-success { printf("%s", curpsinfo->pr_psargs); }'

dtrace: description 'proc:::exec-success ' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0  16132         exec_common:exec-success ls callout.d intr1.d proc1.d 
  0  16132         exec_common:exec-success hostname

(output produced by "ls *.d" in another terminal window)

Observe the  curpsinfo  variable pointing to a special "struct psinfo_t"
filled with info about the "current" process (i.e., the process
that caused the probe to fire),
as described in "Table 25–1 proc Probes" at 
http://docs.sun.com/app/docs/doc/817-6223/chp-proc?a=view

Observe that this struct's  pr_psargs  member contains the string
of arguments to  ls  after the Bash shell expanded them.

---

# Files opened by process,
dtrace -n 'syscall::open*:entry { printf("%s %s",execname,copyinstr(arg0)); }'

Observe the "copyinstr" used to copy the syscall's string argument
into kernel space.

---
 
# Syscall count by program,
dtrace -n 'syscall:::entry { @num[execname] = count(); }'
 
Observe special aggregation syntax:  @num[execname] = count();
creates a counting table that increments the count of each 
individual execname when the probe fires. On exit (^C) 
DTrace prints the table, nicely formatted and sorted.

"num" is the name of the table. If you only have one, you may omit it
and just have  @[execname] = count();

Note that execname (or any expression in @[...]) first gets evaluated
and then the count() action is taken on the associated value in the table
(i.e., execname is used as a key into the table, the value for that key
is extraced and incremented). 

count() merely increments, whereas sum(<some numeric value>) will add
that value. So   @[execname] = sum(1);  has the same effect as above.

---

Suggestion: work thought the rest of the examples in 
http://www.brendangregg.com/DTrace/dtrace_oneliners.txt , using 
the DTrace Guide (http://docs.sun.com/app/docs/doc/817-6223)

More examples:  

http://developers.sun.com/solaris/articles/dtrace_example.html

http://blogs.sun.com/uejio/entry/dtrace_tutorial_for_x_window

---

Extra: OpenSolaris has demo scripts in /usr/demo/dtrace/ . Study them.
 
---

2. DTrace aggregation functionality

DTrace probes can be used for kernel and application
code profiling, such as counting the number of invocations
of certain functions and the time spent in them. 
DTrace provides a built-in datatype for a "counting table".

Aggregations are briefly in the DTrace Quick Reference:
http://developers.sun.com/solaris/articles/dtrace_quickref/dtrace_quickref.html

(While you are at it, have a look at the built-in variables:
http://developers.sun.com/solaris/articles/dtrace_quickref/dtrace_built_in_vars.html. Starting from these, you can unravel the kernel's data structures.
For example, you can follow curthread to the process' struct proc through
curthread->t_procp, and so on to other elements of the proc structure
as depicted in Figure 2.3 on page 57)

and in the DTrace Guide, Chapter 9:
http://docs.sun.com/app/docs/doc/817-6223

http://www.sun.com/bigadmin/content/dtrace/ 
provides several DTrace tutorials, DTrace developer blogs 
and use cases.

Suggestion: go through the "DTrace 1-liners" from the previous lecture.  

3. SDT provider and other non-function-boundary probes.

DTrace's syscal: and fbt: providers' probes can be naturally
aligned with kernel function boundaries. A simple "symbol table"
(a mapping from probe names to function start addresses) can be used
to place the probes.

However, system events targeted by other providers such as proc:,
sched:, and io: have more complex logic that does not easily align
with mere function boundaries. In a word, respective probes must be
placed inside if-then-else branch blocks rather than at function
boundaries.  For example, the logic behind the event may be in the
middle of a function, or almost at the end but not quite.

Look at the definitions of the DTRACE_PROBE -derived macros in
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/sys/sdt.h#77 and provider-specific macros starting at 
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/sys/sdt.h#142

and then search for their uses with "Full Search" (http://src.opensolaris.org/source/ -- since macros are not "symbols")

For example:
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/os/exec.c#495

defines  the proc:::exec-success probe . 

When not activated, it is is present in the kernel as the run of 5
NOPs, which you can see by staring "mdb -k" and scrolling down
several screens of  exec_common::dis  disassembly. When activated
(e.g., with "dtrace -n "proc:::"), it will include the invalid
"LOCK NOP" instruction, with will cause the #UD trap to fire.
Naturally, DTrace hangs its own clause off of the #UD handler,

Further details on the invalid instruction and the related "F00F bug":
http://www.cs.dartmouth.edu/~sergey/cs108/2009/f00f-bug.txt
(NOTE: the www.x86.org link seems broken, local copy:
       http://www.cs.dartmouth.edu/~sergey/cs108/2009/F00FBug.html)

A talk on exploiting processor bugs of the same order as "F00F":
http://conference.hitb.org/hitbsecconf2008kl/?page_id=214

author: http://i.zdnet.com/blogs/kris_kaspersky.jpg?tag=col1%3bpost-1492
slides: google "Remote Code Execution Through Intel CPU Bugs", 
	local "D2T1 - Kris Kaspersky - Remote Code Execution Through Intel CPU Bugs.pdf"