Here's one way to examine how syscalls are done on a Linux 32bit platform. There is a fair amount of x86 assembly in what follows, since it's the actual "ground truth" of how things are implemented. While I don't expect you to know the details of x86 assembly programming just yet, see http://www.cs.dartmouth.edu/~sergey/cs258/tiny-guide-to-x86-assembly.pdf for a short refresher on x86 assembly. First, the platform info: sergey@toy32:~$ cat /etc/issue Ubuntu 11.10 sergey@toy32:~$ uname -a Linux toy32 3.0.0-16-generic #28 SMP Thu Feb 16 09:36:52 EST 2012 i686 i686 i386 GNU/Linux Now we know the kernel version and the distribution. A simple program that makes an exec(2) system call: -------------------------------------------------------- #include #include int main() { char * args[] = {"/bin/ls", NULL}; /* so that I can attach a debugger from another shell */ /* If in doubt w.r.t. what these functions do or their expected argument formats: 'man getpid', 'man 3 sleep', 'man exec' */ printf("pid: %d\n", getpid()); sleep(60); execv("/bin/ls", args); } -------------------------------------------------------- Compile and link with: sergey@toy32:~$ gcc -g -Wall -static -o exec exec.c -static will pull in the binary code for the library functions used into the resulting executable. -g includes debug info, such as the the address where execv code ends up in the executable and in the process address space created from it. For other useful gcc options, see http://www.antoarts.com/the-most-useful-gcc-options-and-extensions/ http://stackoverflow.com/questions/3375697/useful-gcc-flags-for-c Now we run ./exec sergey@toy32:~$ ./exec pid: 2150 In another shell, attach to the process (by its PID) so that we can examine its memory space: root@toy32:/home/sergey# gdb -p 2150 GNU gdb (Ubuntu/Linaro 7.3-0ubuntu2) 7.3-2011.08 Copyright (C) 2011 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i686-linux-gnu". For bug reporting instructions, please see: . Attaching to process 2150 Reading symbols from /home/sergey/exec...done. 0x00918416 in __kernel_vsyscall () (gdb) Now the process is suspended; the debugger is in control of it. It won't wake up until the debugger relinquishes control. (gdb) print main $1 = {int ()} 0x8048cb0
GDB knows where main() is! :) BTW, GDB commands may be shortened to their shortest unambiguous prefix. For example, "print" shortens to "p": (gdb) p main $2 = {int ()} 0x8048cb0
We can disassemble main(). This works because GDB knows the "symbol" for main, so we can conveniently use its main instead of its address 0x8048cb0. Address would also work, of course (try it). For now, just look at the addresses of functions called. The intervening code prepares the arguments for these functions as per _calling convention_ of functions on 32bit Linux. (gdb) disas main Dump of assembler code for function main: 0x08048cb0 <+0>: push %ebp 0x08048cb1 <+1>: mov %esp,%ebp 0x08048cb3 <+3>: and $0xfffffff0,%esp 0x08048cb6 <+6>: sub $0x20,%esp 0x08048cb9 <+9>: movl $0x80af468,0x18(%esp) 0x08048cc1 <+17>: movl $0x0,0x1c(%esp) 0x08048cc9 <+25>: call 0x80536d0 <<------ 0x08048cce <+30>: mov $0x80af470,%edx 0x08048cd3 <+35>: mov %eax,0x4(%esp) 0x08048cd7 <+39>: mov %edx,(%esp) 0x08048cda <+42>: call 0x80497e0 <<------ 0x08048cdf <+47>: movl $0x3c,(%esp) 0x08048ce6 <+54>: call 0x80533c0 <<------ 0x08048ceb <+59>: lea 0x18(%esp),%eax 0x08048cef <+63>: mov %eax,0x4(%esp) 0x08048cf3 <+67>: movl $0x80af468,(%esp) 0x08048cfa <+74>: call 0x80536a0 <<------ 0x08048cff <+79>: leave 0x08048d00 <+80>: ret End of assembler dump. Now let's look at execv. It calls execve (see 'man exec' for the different forms of exec): (gdb) disas execv Dump of assembler code for function execv: 0x080536a0 <+0>: sub $0xc,%esp 0x080536a3 <+3>: mov 0x80d7570,%eax 0x080536a8 <+8>: mov %eax,0x8(%esp) 0x080536ac <+12>: mov 0x14(%esp),%eax 0x080536b0 <+16>: mov %eax,0x4(%esp) 0x080536b4 <+20>: mov 0x10(%esp),%eax 0x080536b8 <+24>: mov %eax,(%esp) 0x080536bb <+27>: call 0x8077430 <<------- 0x080536c0 <+32>: add $0xc,%esp 0x080536c3 <+35>: ret End of assembler dump. OK, now execve: (gdb) disas execve Dump of assembler code for function execve: 0x08077430 <+0>: push %ebx 0x08077431 <+1>: mov 0x10(%esp),%edx <<---- 3rd arg of execve 0x08077435 <+5>: mov 0xc(%esp),%ecx <<---- 2nd arg of execve 0x08077439 <+9>: mov 0x8(%esp),%ebx <<---- 1st arg of execve 0x0807743d <+13>: mov $0xb,%eax <<---- 0xb == 11 is exec's number 0x08077442 <+18>: call *0x80d60bc <<---- actual system call here 0x08077448 <+24>: cmp $0xfffff000,%eax <<--- eax now holds syscall's return code 0x0807744d <+29>: ja 0x8077451 0x0807744f <+31>: pop %ebx 0x08077450 <+32>: ret 0x08077451 <+33>: mov $0xffffffe8,%edx 0x08077457 <+39>: neg %eax 0x08077459 <+41>: mov %gs:0x0,%ecx 0x08077460 <+48>: mov %eax,(%ecx,%edx,1) 0x08077463 <+51>: or $0xffffffff,%eax 0x08077466 <+54>: pop %ebx 0x08077467 <+55>: ret End of assembler dump. According to Linux 32bit _userland_ function call convention, arguments are passed on the stack. However, Linux 32bit system call convention is different: arguments are passed in registers. Therefore the movs before the call copy arguments (pointers to various strings/arrays of strings required by execve) into the registers where the kernel would expect to find them. The address of whatever makes the system call is at 0x80d60bc. Let's see it. [ Read up on the 'x' command of GBD -- it's very useful; there is a reason it is only one letter long -- it's expected to be used very often: (gdb) help x ] (gdb) x/2x 0x80d60bc 0x80d60bc <_dl_sysinfo>: 0x00918414 0x08080d90 Now we know where the call will lead us: (gdb) disas 0x00918414 Dump of assembler code for function __kernel_vsyscall: 0x00918414 <+0>: int $0x80 <<----- jump to kernel => 0x00918416 <+2>: ret End of assembler dump. The address in the kernel ("ring 0") space where the 'int 0x80' instruction will take us is stored in the Interrupt Descriptor Table (IDT), pointed to by the special register IDTR, set at boottime as a part of the system's runtime configuration. It is supposed to be invisible to userspace (*) and cannot be changed from userspace programs. Why? Because the whole point of the 'call gate' implementation of kernel from userland separation is that the jump into the kernel code is to a pre-determined address that the userland cannot affect. Otherwise, a devious userland program would jump past the process' privilege checks or otherwise execute kernel code from an arbitrary point, violating the integrity of kernel data structures/bookkeeping. BTW, we can also look at how these instructions are encoded in memory: (gdb) x/3b 0x00918414 0x918414 <__kernel_vsyscall>: 0xcd 0x80 0xc3 ^^^^^^^^^^^^ ^^^^ "int 0x80" "ret" So now we know how the system call happens: the execv() standard C library (a.k.a. libc) call wraps the invocation of the raw software interrupt 0x80. As a part of this wrapping, arguments for the syscall are put into registers as per syscall calling convention. ---- (*) But it fact is was visible in earlier processors, and enabled the so-called "Red Pill" method of detecting whether a program was run on 'bare metal' or in a virtual machine (see http://www.ouah.org/Red_%20Pill.html for the article that coined the term).