Elements of system call implementation and OpenSolaris/Illumos process model (Chapter 2).
Reading (also referred to throughout this text): 

Pace yourself, don't plan to do this in one sitting!

In Chapter 1: Skim through:

   1.4 (to understand "lwp" in proc_t member names)
   1.7 (keep in mind that we look at x86, not SPARC!
        for x86 details read the execellent blogposts by Gustavo Duarte:
	http://duartes.org/gustavo/blog/post/memory-translation-and-segmentation
        http://duartes.org/gustavo/blog/post/cpu-rings-privilege-and-protection
        http://duartes.org/gustavo/blog/post/anatomy-of-a-program-in-memory)
	
  Chapter 2:  As you read, keep looking at proc_t definition
              and try commands from 2.13 in "mdb -k" (must be root)
	   
   Read carefully:
   2.1 - 2.5 (especially 2.4 and Fig. 2.3)
               
   The figures are the best part of the book,
   spend time understanding them and following
   the links/pointers in code or actual memory           
   (with mdb -k)

   2.8 (again, keep in mind we are on x86)

   You can optionally start reading the following:
   2.10 - 2.10.1 (the /proc filesystem, 
          	  keep looking at the code in 
		  http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/proc/prvnops.c and Fig. 2.10)

1. System calls as the centerpiece of a Unix kernel. 
 
All privileged operations in Unix are performed on behalf of user
processes by "system call" code located in the kernel.  The data that
this code operates on is also located in the kernel and can only be
directly accessed when the CPU is in "kernel mode". This ensures that
user processes get to use this code only as a "package deal", with the
up-front permission and sanity checks being a part of the
package. This mechanism is the basis of the OS stability and security.

2. Some Linux details:

User-level code accesses syscall code through the so-called "call gate" 
mechanism: it sets the number of the desired call in a register
(EAX on Linux/x86), sets arguments or pointers to arguments
in other registers (EBX, ECX, EDX, ... on Linux 32bits) and executes
the "int 0x80" instruction (older 32bit systems), or "syscall" or "sysenter"
instructions (newer and 64bit systems). Note that the system call function
is accessed only by it number, not by its address, which user-level
code cannot "jump" or "call" to (if it tries, a segfault occurs). 

The "int 0x80" instruction simultaneously puts the CPU into the kernel
mode ("ring 0") and transfers control to the address stored in the 0x80-th slot
of the x86 CPU's Interrupt Descriptor Table (which is pointed to
by the CPU's special IDTR register). That address is *the single
entry point* for all system calls. 

Look at the nice Fig. 1 in this IBM developer article on syscalls:
http://www.ibm.com/developerworks/linux/library/l-system-calls/

Look at ENTRY(system_call) in an older Linux kernel: 
http://lxr.linux.no/linux+v2.6.24/arch/x86/kernel/entry_32.S

Observe the sys_call_table on an older Linux kernel: 
http://lxr.linux.no/linux+v2.6.24/arch/x86/kernel/syscall_table_32.S

Details on Linux system calls:
http://www.ibm.com/developerworks/linux/library/l-system-calls/

3. Some OpenSolaris details:

Syscall numbers exposed in Solaris in:  /etc/name_to_sysnum

Syscall numbers defined in: /usr/src/uts/common/sys/syscall.h 
(http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/sys/syscall.h)

Syscalls dispatched in: /usr/src/uts/intel/ia32/os/syscall.c
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/intel/ia32/os/syscall.c

Observe: dosyscall() gets the address of the requested syscall
         function by "code" in syscall_entry() then executes
	 it by function pointer (lines 896--898).


System call table: usr/src/uts/common/os/sysent.c
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/os/sysent.c

Observe:    Line 439 and below,  struct sysent sysent[NSYSCALL] = ...


4. A simple syscall: getpid()

http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/syscall/getpid.c#42

Looks up the PID via the pointer to the current thread descriptor curthread
(follows the pointer to the process structure of type proc_t, then
locates the integer PID value through that -- see my MDB session in 
proc_t-kernel-view.txt)

Kernel struct that keeps process data (alongside with some others,
explained briefly on pp. 44--48 of the textbook, details in Section 2.4):
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/sys/proc.h#130

Suggestion: explore proc_t for the linking between process structs. How
many other proc_t's are linked to it and why? (Many...)

[Similar structure for Linux is task_struct:
 http://lxr.linux.no/linux+v2.6.24/include/linux/sched.h#L917 ]

5. Tracing a program's system calls.

"truss" on OpenSolaris, "strace" on Linux trace all the syscalls made
by an application.

Try:
truss echo Hello

The tracing starts from the execve call that loads the binary for
the echo command in the shell's path (/usr/gnu/bin/echo) and the
subsequent mmap calls that load the segments od code and data
from that binary. This binary is in the ELF format (cf. the output
of "file /usr/gnu/bin/echo").

Before the code can run, all the necessary components of that
dynamically linked file must be loaded (mmap-ed), such as the dynamic
linker-loader itself (/usr/lib/ld.so.1), and the libc library
(cf. "ldd /usr/gnu/bin/echo"). These dependencies are described
in the EFL file format, and are parsed out of it by the kernel's
binary format handler.  

Observe that all but the last five of these syscalls are for "setting
up" so that the write() syscall can finally do the echo's job.
Understanding how processes are set up requires some knowledge
about the ELF format. 

6. DTrace

The textbook makes use of the DTrace tracing tool (which truss is built on):

   http://wiki.illumos.org/display/illumos/DTrace

DTrace allows to observe unprecedented amounts of events happening
on in the kernel, by placing "probes" throughout the kernel code
and printing out and aggregating the information produced by them
when execution reaches them.

See dtrace-notes.txt or DTrace-User-Guide.pdf if you have questions
about examples in Chapter 2.

7. Using the Modular Debugger (MDB) to examine kernel state 

"mdb -k" launches the debugger and "attaches" it to the running 
kernel (just as "mbd -p <process id>" would attach it to a running
user process).

The debugger has a somewhat different command syntax from GDB.
Its  ::help  command is a useful entry point.

Here is a (larger) tutorial:
http://learningsolaris.com/docs/chpt_mdb_os.pdf (local copy: mdb-reference-chapter.pdf)
In a hurry, skip the historical intro, go straight to examples of commands.

Tip: You can pipe debugger commands' output through grep when
there is too much of it, rather than dealing with the internal pager.
E.g.:  
      ::dcmds !grep module 
to catch all commands with names or descriptions that contain "module".

To see the address (in hex) of a symbol (function, variable, or any other
thing that the debugger knows the address of):

<symbol>=K  (or =X on 32bit machines) 

E.g.: 

getpid=K   -- the address of the getpid function as a 64bit hex number
getpid=X   -- the address of the getpid function as a 32bit hex number 
	       (on 64bit machine, this will just give you the lower part of
	        the address, and no warning that the address is incomplete)

To see the contents (in hex) of memory at that address, use / rather than =

getpid/X   -- the opcodes at the start of the getpid function as one 32bit
	      hex number (little endian)

getpid/4B  -- the same, as 4 separate bytes, in order

getpid/4i  -- the opcodes disassembled into instructions (stops at first 4)

More about formats: ::formats  (e.g., "::formats !grep hex")

getpid::dis -- disassembles the whole function

====

Useful commands to look up:

::ps

::pstree

::objects

::print -t "struct proc"  -- print the definition of a data type 

::print -a "struct proc"  -- print the definition of a struct type 
	   	   	     with hex offsets of each member 

see also my MDB session looking at proc_t's (proc_t-kernel-view.txt)

More info on this is in Chapter 2.4 ("Process Structures").
This is what ps and other process reporting utilities extract;
we are going to see how.

8. Tracing ps 

ps on modern operating systems does little beyond reading /proc
and interpreting and pretty-printing its contents. The kernel
exposes its process control info in /proc's pseudo-files, and
it is the kernel's functions that walk the process control blocks
in response to your ps's "open" and "read" system calls.

The design which re-dispatches these general file-related system
calls to the appropriate worker functions is called VFS. Linux
uses a similar design (except Linux's inodes are the same thing
as Solaris' vnodes).

See Figure 1.5 on p. 31 and table 1.1 on p.32 for an overview
of VFS.

Do this on Linux "strace ps" and on Illumos "truss ps"

9. Doing the work of /proc 

See what system calls ps makes with "truss ps".

The getdents64 function is what lists the contents of
a directory (this is what ls uses, too). The directory
read by ps is, of course, /proc. See "man getdents".

See d/procdents.d for a DTrace script that exposes kernel functions 
called in response to a ps commands' getdents call.   

Look at their code in the Solaris source browser.

We will read the revealed functions' code to see how they
walk the process list next time.

10. Reading kernel code

The OpenGrok code browsing system is at  http://src.illumos.org/source/ 

Kernel code "lives" under project "illumos-gate", 
under the path  /illumos-gate/usr/src/uts/   (note UTS, which stands
for Unix Time-Sharing, a very legacy name)

Before we start reading kernel code in earnest, here are 
some idioms.

==== Functions defined in assembly ====

The ENTRY_* macros create function symbols that the linker will
treat as normal C functions (when C functions are compiled into
assembly, similar assembly is actually generated for them, too):

http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/intel/ia32/sys/asm_linkage.h#210

#define  ENTRY_NP(x) \
 	 .text; \			<--- place in .text (code) segment
	 .align	ASM_ENTRY_ALIGN; \      <--- align at 4 byte boundary 
 	 .globl	x; \		 	<--- make macro's arg a global symbol..
 	 .type	x, @function; \         <---  of type "function"
x:	 	   	      		<--- here it starts...  

==== Getting pointer to current thread ====

http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/intel/asm/thread.h -- thread pointer context:

extern __inline__ struct _kthread *threadp(void)
{
	void *__value;
 
#if defined(__amd64)
    __asm__ __volatile__(
        "movq %%gs:0x18,%0"		/* CPU_THREAD */
 	    : "=r" (__value));
#elif defined(__i386)
      __asm__ __volatile__(
          "movl %%gs:0x10,%0"		/* CPU_THREAD */
 	      : "=r" (__value));
#else
#error	"port me"
#endif
	return (__value);
}

For explanations of the __asm__ embedding of Assembly into gcc 
C code, see http://www.ibm.com/developerworks/library/l-ia.html, or
http://www.cs.virginia.edu/~clc5q/gcc-inline-asm.pdf  (local copy: gcc-inline-asm.pdf)
for more details.

    (Note: for functions that include assembly, the kernel 
           contains a "__lint" version of the code that does
      	   not actually get built but keeps the compiler
	   in checking ("lint") mode happy. For more info
	   see manpage of "lint").

    (Many macros in /illumos-gate/usr/src/uts/common/sys/thread.h
           are nice and readable; the key point is threadp(),
	   which is CPU-dependent.)

For explanations of "extern __inline__" see (**).

Here is how it is used:

http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/sys/thread.h#528

extern	kthread_t	*threadp(void);	      /* inline, returns thread pointer */
#define	curthread	(threadp())	      /* current thread pointer */
#define	curproc	        (ttoproc(curthread))  /* current process pointer */
#define	curproj		(ttoproj(curthread))  /* current project pointer */
#define	curzone		(curproc->p_zone)     /* current zone pointer */

cf: in getpid() code:

int64_t
getpid(void)
{
	rval_t	r;
 	proc_t	*p;
 
	p = ttoproc(curthread);    <--- will access local thread storage off %gs
	    			   	system call will make sure %gs segment 
					selector is right for the process on
					behalf of which the system call is made,
					i.e., points to the right proc_t .
 	r.r_val1 = p->p_pid;
 	if (p->p_flag & SZONETOP)
 	   r.r_val2 = curproc->p_zone->zone_zsched->p_pid;
   	else
	   r.r_val2 = p->p_ppid;
	return (r.r_vals);
}

================================================================

(**) extern __inline__ explained:
http://publib.boulder.ibm.com/infocenter/compbgpl/v9v111/index.jsp?topic=/com.ibm.xlcpp9.bg.doc/language_ref/cplr243.htm --

"If you specify the __inline__ keyword, with the trailing underscores,
the compiler uses the GNU C semantics for inline functions. In
contrast to the C99 semantics, a function defined as __inline__
provides an external definition only; a function defined as static
__inline__ provides an inline definition with internal linkage (as in
C99); and a function defined as extern __inline__, when compiled with
optimization enabled, allows the co-existence of an inline and
external definition of the same function. For more information on the
GNU C implementation of inline functions, see the GCC documentation,
available at http://gcc.gnu.org/onlinedocs/." 

Why all this? See
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/intel/ia32/ml/i86_subr.s#2402 
 -- a different definition in another file, and yet no linking problem)