--- x86_64 See the history of the x86 64 bit extention architecture in solaris-on-x86.pdf in the course directory, Chapter 2. You may skip the details in Section 2.2. --- Calling conventions To see the great variety of calling conventions used by vendors, skim http://en.wikipedia.org/wiki/X86_calling_conventions Calling conventions are typically specified in the ABI documentation, the fundamental piece of documentation for any processor: ABI specification (for x86_64: www.x86-64.org/documentation/abi.pdf). Compiler writers and systems programmers are expected to follow its recommendations, so that code compiled in one place and time passes its data to code compiled in another in the form and layout that it expects. --- Syscalls System call tables and system call dispatcher code are typically implemented in assembly. For example, in recent Linux x86_64 kernels: Table: http://lxr.linux.no/#linux+v3.2.1/arch/x86/include/asm/unistd_64.h (included from line 28 of the sys_call_table definition in http://lxr.linux.no/#linux+v3.2.1/arch/x86/kernel/syscall_64.c) Dispatcher: http://lxr.linux.no/#linux+v3.2.1/arch/x86/kernel/entry_64.S#L425 The gist of dispatching a syscall (and the first of several tables it goes through before it does any actual work) is: cmpq $__NR_syscall_max,%rax ja badsys movq %r10,%rcx call *sys_call_table(,%rax,8) # XXX: rip relative movq %rax,RAX-ARGOFFSET(%rsp) Convince yourself that "ja" is exactly what's needed to guard again invalid syscall numbers -- and then see the off-by-one bug below (under "Miscellany"). Notice the subtle difference between the calling conventions for userland and kernel, the use of r10 vs rcx, which the third line above compensates for. More info at: http://stackoverflow.com/questions/2535989/what-are-the-calling-conventions-for-unix-linux-system-calls-on-x86-64, quoting the above abi.pdf: ================================================================================= : "A.2 AMD64 Linux Kernel Conventions" of System V Application Binary Interface AMD64 Architecture Processor Supplement Here is the snippet from this section: User-level applications use as integer registers for passing the sequence %rdi, %rsi, %rdx, %rcx, %r8 and %r9. The kernel interface uses %rdi, %rsi, %rdx, %r10, %r8 and %r9. A system-call is done via the syscall instruction. The kernel destroys registers %rcx and %r11. The number of the syscall has to be passed in register %rax. System-calls are limited to six arguments, no argument is passed directly on the stack. Returning from the syscall, register %rax contains the result of the system-call. A value in the range between -4095 and -1 indicates an error, it is -errno. Only values of class INTEGER or class MEMORY are passed to the kernel. ================================================================================= (last sentence means: no floating point, just intergers and pointers) ----------- In class, we stopped here. We'll continue on Friday. ------------ HW: Subtle questions to understand. I did not discuss these in class, but they are really important: 1. How is the value in RSP switched from the "userland" stack pointer (pointing to the process' userland stack, in the runtime virtual address range that the process uses for its function calls when running at ring3 privilege, calling its own and library functions) to the "kernel" stack (pointing to the 4K stack page in the kernel space, as used by kernel functions called on behalf of the process, and hosting the thread_info struct at the bottom of that page). 2. A syscall from a userland program should return to the next instruction (unless causing a program to terminate because of an uncatcheable error). How is the address of that next instruction stored and passed when syscall/sysenter callgate mechanism is in use? Note that this question is much simpler for the 32bit Linux syscall mechanism: "int 0x80" pushes the address on then kernel stack, and the IRET instruction at the end of a syscall implementation restores both that address to EIP and the CPL (ring3) to processor state. --- New Linux system call vs old: http://stackoverflow.com/questions/8510333/x86-64-assembly-linux-system-call-confusion --- Older Linux system call mechanism explained: http://www.linuxjournal.com/article/3326 http://www.linuxjournal.com/article/3326?page=0,1 http://www.linuxjournal.com/article/3326?page=0,2 Pretty Linux syscall table, linking to a nicer cross-ref engine by http://free-electrons.com/ [check out their embedded Linux training materials] http://syscalls.kernelgrok.com/ --- New Linux syscall mechanism: http://articles.manugarg.com/systemcallinlinux2_6.html http://justanothergeek.chdir.org/2010/02/how-system-calls-work-on-recent-linux.html Linked and worth reading: http://www.trilithium.com/johan/2005/08/linux-gate/ http://www.win.tue.nl/~aeb/linux/lk/lk-4.html http://articles.manugarg.com/aboutelfauxiliaryvectors.html Linus Torvalds on the "absolutely wonderfully _disgusting_" new syscall mechanism: https://lkml.org/lkml/2002/12/18/218 On how to add a new syscall: http://blog.zhangsen.org/2008/12/how-to-add-syscall-on-x8664.html ttp://semipublic.comp-arch.net/wiki/SYSENTER/SYSEXIT_vs._SYSCALL/SYSRET Library versions may differ. Cf. http://stackoverflow.com/questions/2747187/how-to-find-which-type-of-system-call-is-used-by-a-program : "For example, on one old machine, getpid itself did a int 0x80 while in a newer machine, getpid does a call *gs:0x10 which brings it to __kernel_vsyscall which does a sysenter." "In amd64 it doesn't do call *gs:0x10 it just does callq. It does have a vdso page mapped on the process space. But __kernel_vsyscall is not called. -- different builds of libc." --- Miscellany Funny off-by-one bug (potential vulnerability?) in a syscall dispatcher implementation, by Dan Rosenberg: https://lkml.org/lkml/2012/1/6/130 [Suggestion: check what if it would be a vulnerability if it existed on your Linux machine]