Kprobes slides: kprobes.pdf Re-orientation: Solaris/Illumos' adaptive locks OpenSolaris optimizes its mutexes based on the two assumptions: 1. Critical sections are short, and once entered by a CPU, will be over very soon (faster than the context switches involved in blocking a waiting thread on a turnstile so that some other thread could run, and then waking it up). Hence the idea that a thread should spin if the mutex is held by a thread currently on a CPU, and block otherwise. 2. Most kernel mutexes in the kernel can be adaptive (i.e., can block without causing scheduling trouble), are not hotly contested, and therefore most threads in most cases will find a mutex BOTH *adaptive* and *not taken*. Hence this path -- adaptive && not taken -- and the mutex imlementation data structure (see mutex_impl.c union) are *co-optimized* in assembly to a single lock-ed (atomic) instruction and return: http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/intel/ia32/ml/lock_prim.s#512 -- comment http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/intel/ia32/ml/lock_prim.s#558 -- code Note: according to AMD 64 calling conventions, RDI holds the first argument to a function, which for mutex_enter is the pointer to the mutex data structure from mutex_impl.c Question: what happens if kernel pre-emption hits on this path? ====== Linux kernel was not pre-emptable will till 2.6. The story: http://www.linuxjournal.com/article/5600 http://lwn.net/Articles/22912/ http://lxr.linux.no/#linux+v2.6.32/include/linux/preempt.h -- Kernel implementation ====== Solaris kernel, in contrast, was designed to be pre-emptable https://blogs.oracle.com/sameers/entry/solaris_kernel_is_pre_emptible In a pre-emptable kernel, spinlocks are the only choice for drivers and other critical operations that cannot block. See chapter 17 for explanations. For everything else there are adaptive locks, which choose to either block or spin, based on a simple heuristic: if the holder of the lock is running on another CPU, it won't hold the lock for long, and it makes sense to spin; if the holder is blocked, than blocking is the only option, because we don't know how long the owner will remain blocked. Spinlocks on multiprocessor machines ultimately depend on cache coherency logic to be efficient. Since locking the memory bus on every check of a state-holding memory byte would likely be too wasteful, and the results of non-locked reads cannot be relied on to persist even until the next instruction in the current thread, implementors of locks must deal with the case when the lock is snatched away just after its state was read and found "free" (or convince themselves that it can never happen on a particular platform). Keep this in mind when reading the code. A few details in preparation. Getting the CPU a thread is running on (recall that %gs is used to hold "per-thread context" data): http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/intel/ia32/ml/i86_subr.s ENTRY(curcpup) movl %gs:CPU_SELF, %eax ret #define CPU (curcpup()) /* Pointer to current CPU */ See also the definition of cpu_t : http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/sys/cpuvar.h Suggestion: Trace the use of cpu_t through Vmem, to see their use in non-blocking allocation of Kmem cache objects various CPU* macros; observe the limitations of that approach (in particular, the occasions where global locks get taken).