Welcome to CS 108. Goal: To explore and use new features of operating systems that appeared during the last decade. To epxlore OS mechanisms that everyone uses every day, but few people know in detail, such as dynamic linking, executable and linkable format, and OS debugging support. To write working, non-trivial kernel code. Software: OpenSolaris and some Linux. Schedule: We meet in 214 Tue, Thu 10:00-11:50am, our X-hour is Wed 3-4pm. Please feel free to e-mail me (sergey@cs) to schedule appointments. Grading: My plan is to have 2 midterms constituting 30% of the grade each and a course project for 40%. These fractions may change; an impressive (and working) project may count for more. Course project: Implement, improve, or take advantage of a new interesting OS feature. Please start thinking about your project right away. ----------------------------------------------------------------------- Points from Jan 06 lecture. 1. System calls as the centerpiece of a UNIX kernel. All privileged operations in UNIX are performed on behalf of user processes by "system call" code located in the kernel. The data that this code operates on is also located in the kernel and can only be directly accessed when the CPU is in "kernel mode". This ensures that user processes get to use this code only as a "package deal", with the up-front permission and sanity checks being a part of the package. This mechanism is the basis of the OS stability and security. 2. Some Linux details: User-level code accesses syscall code through the so-called "call gate" mechanism: it sets the number of the desired call in a register (EAX on Linux/x86), sets arguments or pointers to arguments in other registers (EBX, ECX, EDX, ... on Linux) and executes the "int 0x80" instruction. Note that the system call function is accessed only by it number, not by its address, which user-level code cannot "jump" or "call" to (if it tries, a segfault will occur). The "int 0x80" instruction simultaneously puts the CPU into the kernel mode and transfers control to the address stored in the 0x80-th slot of the x86 CPU's Interrupt Descriptor Table (which is pointed to by the CPU's special IDTR register). That address is *the single entry point* for all system calls. Look at the nice Fig. 1 in this IBM developer article on syscalls: http://www.ibm.com/developerworks/linux/library/l-system-calls/ Look at ENTRY(system_call) in: http://lxr.linux.no/linux+v2.6.24/arch/x86/kernel/entry_32.S Observe the sys_call_table: http://lxr.linux.no/linux+v2.6.24/arch/x86/kernel/syscall_table_32.S Details on Linux system calls: http://www.ibm.com/developerworks/linux/library/l-system-calls/ 3. Some OpenSolaris details: Syscall numbers exposed in Solaris in: /etc/name_to_sysnum Syscall numbers defined in: /usr/src/uts/common/sys/syscall.h (http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/sys/syscall.h) Syscalls dispatched in: /usr/src/uts/intel/ia32/os/syscall.c http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/intel/ia32/os/syscall.c Observe: dosyscall() gets the address of the requested syscall function by "code" in syscall_entry() then executes it by function pointer (lines 920--925). System call table: usr/src/uts/common/os/sysent.c http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/os/sysent.c Observe: Line 430, struct sysent sysent[NSYSCALL] = ... 4. A simple syscall: getpid() http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/syscall/getpid.c#44 Looks up the PID via the pointer to the current thread descriptor curthread (follows the pointer to the process structure of type proc_t, then locates the integer PID value through that). Kernel struct that keeps process data (alongside with some others, explained briefly on pp. 44--48 of the textbook, details in Section 2.4): http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/sys/proc.h#127 Suggestion: explore proc_t for the linking between process structs. How many other proc_t's are linked to it and why? (Many...) Similar structure for Linux is task_struct: http://lxr.linux.no/linux+v2.6.24/include/linux/sched.h#L917 5. Intermission: making a syscall straight through assembly code. Hacker exploits ("shellcode", http://shellcode.org/Shellcode/) must deal with raw assembly. Forget nice library wrappers :-) Example of Linux shellcode: http://shellcode.org/Shellcode/linux/simple/ Suggestion: Study the Linux example. Refresh your memory about x86 Asm with http://www.cs.virginia.edu/~apb/OLD.CS308/x86_newnew.pdf . You can view the disassembled shellcode with "objdump -D a.out | less +/shellcode " where a.out is the compiled tiny C program from the example. See "man 2 execve" to recall execve's required arguments. See if you can rewrite this shellcode for Solaris so that it runs and launches a shell! 6. Tracing a program's system calls. truss on OpenSolaris, strace on Linux trace all the syscalls made by an application. truss echo Hello The tracing starts from the execve call that loads the binary for the echo command in the shell's path (/usr/gnu/bin/echo) and the subsequent mmap calls that load the segments od code and data from that binary. This binary is in the ELF format (cf. the output of "file /usr/gnu/bin/echo"). Before the code can run, all the necessary components of that dynamically linked file must be loaded (mmap-ed), such as the dynamic linker-loader itself (/usr/lib/ld.so.1), and the libc library (cf. "ldd /usr/gnu/bin/echo"). These dependencies are described in the EFL file format, and are parsed out of it by the kernel's binary format handler. Observe that all but the last five of these syscalls are for "setting up" so that the write() syscall can finally do the echo's job. Understanding how processes are set up requires some knowledge about the ELF format. 7. A tour of the ELF format readelf -a /usr/gnu/bin/echo | less This file consists of 27 sections that contain different kinds of data and code (see also Section 2.3) ELF is beautifully flexible, but also quite complex (although not gratuitously so). Some ELF links: http://www.linuxjournal.com/node/1060/print http://en.wikipedia.org/wiki/Executable_and_Linkable_Format -- not very helpful beyond the "ELF file layout" and "Further reading", but have a look. My links on ELF hacking: http://althing.cs.dartmouth.edu/secref/resources/elf-hackery.shtml 8. DTrace http://opensolaris.org/os/community/dtrace/ DTrace guide: http://opensolaris.org/os/community/dtrace/ DTrace allows to observe unprecedented amounts of events happening on in the kernel, by placing "probes" throughout the kernel code and printing out and aggregating the information produced by them when execution reaches them. DTrace probes all have fixed interger IDs, but are normally referred to via human-readable names: provider:module:func:probe To list syscall-related probes: pfexec dtrace -l -n syscall::: (parts that match all can be * or simply omitted) To list only probes that fire on entry to a syscall: pfexec dtrace -l -n syscall:::entry To list probes matching pattern: pfexec dtrace -l -n syscall:*:o*:entry ID PROVIDER MODULE FUNCTION NAME 70776 syscall open entry 71172 syscall open64 entry 9. Installing your own copy of OpenSolaris in a virtual machine: I installed OpenSolaris in a VirtualBox environment (free, recently bought by Sun and adjusted for OpenSolaris) on MacOS 10.5 . These nice step-by-step instructions with screenshots http://www.javapassion.com/handsonlabs/opensolarisvirtual/ will lead you to the point where you need to double-click "Install OpenSolaris", which will install it onto your virtual disk. Then follow OpenSolaris installer instructions (in particular, accept the default partitioning scheme). Once everything has been installed, reboot the VirtualBox and choose "Boot from Hard Drive" in the Grub menu (or else you will boot from the installation CD ISO image again; or "detach" this ISO image before "booting" the VirtualBox).