Welcome to CS 108. 

Goal: 
To explore and use new features of operating systems that appeared
during the last decade. To epxlore OS mechanisms that everyone uses
every day, but few people know in detail, such as dynamic linking,
executable and linkable format, and OS debugging support. To write
working, non-trivial kernel code.

Software:
OpenSolaris and some Linux.

Schedule: 
We meet in 214 Tue, Thu 10:00-11:50am,
our X-hour is Wed 3-4pm. Please feel free to
e-mail me (sergey@cs) to schedule appointments.

Grading: 
My plan is to have 2 midterms constituting
30% of the grade each and a course project 
for 40%. These fractions may change; an impressive
(and working) project may count for more. 
   
Course project: 
Implement, improve, or take advantage of a new 
interesting OS feature. Please start thinking
about your project right away.

-----------------------------------------------------------------------

Points from Jan 06 lecture.

1. System calls as the centerpiece of a UNIX kernel. 
 
All privileged operations in UNIX are performed on behalf of user
processes by "system call" code located in the kernel.  The data that
this code operates on is also located in the kernel and can only be
directly accessed when the CPU is in "kernel mode". This ensures that
user processes get to use this code only as a "package deal", with the
up-front permission and sanity checks being a part of the
package. This mechanism is the basis of the OS stability and security.

2. Some Linux details:

User-level code accesses syscall code through the so-called "call gate" 
mechanism: it sets the number of the desired call in a register
(EAX on Linux/x86), sets arguments or pointers to arguments
in other registers (EBX, ECX, EDX, ... on Linux) and executes
the "int 0x80" instruction. Note that the system call function
is accessed only by it number, not by its address, which user-level
code cannot "jump" or "call" to (if it tries, a segfault will occur). 

The "int 0x80" instruction simultaneously puts the CPU into the kernel
mode and transfers control to the address stored in the 0x80-th slot
of the x86 CPU's Interrupt Descriptor Table (which is pointed to
by the CPU's special IDTR register). That address is *the single
entry point* for all system calls. 

Look at the nice Fig. 1 in this IBM developer article on syscalls:
http://www.ibm.com/developerworks/linux/library/l-system-calls/

Look at ENTRY(system_call) in:
http://lxr.linux.no/linux+v2.6.24/arch/x86/kernel/entry_32.S

Observe the sys_call_table: 
http://lxr.linux.no/linux+v2.6.24/arch/x86/kernel/syscall_table_32.S

Details on Linux system calls:
http://www.ibm.com/developerworks/linux/library/l-system-calls/

3. Some OpenSolaris details:

Syscall numbers exposed in Solaris in:  /etc/name_to_sysnum

Syscall numbers defined in: /usr/src/uts/common/sys/syscall.h 
(http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/sys/syscall.h)

Syscalls dispatched in: /usr/src/uts/intel/ia32/os/syscall.c
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/intel/ia32/os/syscall.c

Observe: dosyscall() gets the address of the requested syscall
         function by "code" in syscall_entry() then executes
	 it by function pointer (lines 920--925).


System call table: usr/src/uts/common/os/sysent.c
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/os/sysent.c

Observe:    Line 430,  struct sysent sysent[NSYSCALL] = ...


4. A simple syscall: getpid()

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/syscall/getpid.c#44

Looks up the PID via the pointer to the current thread descriptor curthread
(follows the pointer to the process structure of type proc_t, then
locates the integer PID value through that).

Kernel struct that keeps process data (alongside with some others,
explained briefly on pp. 44--48 of the textbook, details in Section 2.4):
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/sys/proc.h#127

Suggestion: explore proc_t for the linking between process structs. How
many other proc_t's are linked to it and why? (Many...)

Similar structure for Linux is task_struct:
http://lxr.linux.no/linux+v2.6.24/include/linux/sched.h#L917

5. Intermission: making a syscall straight through assembly code.

Hacker exploits ("shellcode", http://shellcode.org/Shellcode/) 
must deal with raw assembly. Forget nice library wrappers :-)

Example of Linux shellcode: 
http://shellcode.org/Shellcode/linux/simple/

Suggestion: Study the Linux example. Refresh your memory about x86 Asm
	    with http://www.cs.virginia.edu/~apb/OLD.CS308/x86_newnew.pdf .
	    You can view the disassembled shellcode with 
	    "objdump -D a.out | less +/shellcode "
	    where a.out is the compiled tiny C program from the example.
	    See "man 2 execve" to recall execve's required arguments.

	    See if you can rewrite this shellcode for Solaris so that it runs
	    and launches a shell!

6. Tracing a program's system calls.

truss on OpenSolaris, strace on Linux trace all the syscalls made
by an application.

truss echo Hello

The tracing starts from the execve call that loads the binary for
the echo command in the shell's path (/usr/gnu/bin/echo) and the
subsequent mmap calls that load the segments od code and data
from that binary. This binary is in the ELF format (cf. the output
of "file /usr/gnu/bin/echo").

Before the code can run, all the necessary components of that
dynamically linked file must be loaded (mmap-ed), such as the dynamic
linker-loader itself (/usr/lib/ld.so.1), and the libc library
(cf. "ldd /usr/gnu/bin/echo"). These dependencies are described
in the EFL file format, and are parsed out of it by the kernel's
binary format handler.  

Observe that all but the last five of these syscalls are for "setting
up" so that the write() syscall can finally do the echo's job.
Understanding how processes are set up requires some knowledge
about the ELF format. 

7. A tour of the ELF format

readelf -a /usr/gnu/bin/echo | less 

This file consists of 27 sections that contain different kinds
of data and code (see also Section 2.3)

ELF is beautifully flexible, but also quite complex (although
not gratuitously so). 

Some ELF links:
http://www.linuxjournal.com/node/1060/print
http://en.wikipedia.org/wiki/Executable_and_Linkable_Format -- not
very helpful beyond the "ELF file layout" and "Further reading",
but have a look.

My links on ELF hacking:
http://althing.cs.dartmouth.edu/secref/resources/elf-hackery.shtml

8. DTrace

http://opensolaris.org/os/community/dtrace/

DTrace guide: http://opensolaris.org/os/community/dtrace/

DTrace allows to observe unprecedented amounts of events happening
on in the kernel, by placing "probes" throughout the kernel code
and printing out and aggregating the information produced by them
when execution reaches them.

DTrace probes all have fixed interger IDs, but are normally
referred to via human-readable names:

provider:module:func:probe

To list syscall-related probes:

pfexec dtrace -l -n syscall:::

(parts that match all can be * or simply omitted)

To list only probes that fire on entry to a syscall:

pfexec dtrace -l -n syscall:::entry

To list probes matching pattern:

pfexec dtrace -l -n syscall:*:o*:entry
   ID   PROVIDER            MODULE                          FUNCTION NAME
70776    syscall                                                open entry
71172    syscall                                              open64 entry
 
<TBC>

9. Installing your own copy of OpenSolaris in a virtual machine:

I installed OpenSolaris in a VirtualBox environment (free, recently
bought by Sun and adjusted for OpenSolaris) on MacOS 10.5 . 

These nice step-by-step instructions with screenshots 

http://www.javapassion.com/handsonlabs/opensolarisvirtual/

will lead you to the point where you need to double-click
"Install OpenSolaris", which will install it onto your 
virtual disk. Then follow OpenSolaris installer instructions
(in particular, accept the default partitioning scheme).

Once everything has been installed, reboot the VirtualBox 
and choose "Boot from Hard Drive" in the Grub menu (or else
you will boot from the installation CD ISO image again; or
"detach" this ISO image before "booting" the VirtualBox).