Lecture 7: please see 2009/l6.txt and Chapter 17 for the details of
	   adaptive lock implementations. We also touched upon
	   cache coherence logic as it applies to spinlocks; 
	   see the video at   cliff-click-on-x86-url.txt   from 36min mark.

Lecture 8: 

OpenSolaris optimizes its mutexes based on the two assumptions:
 
1. Critical sections are short, and once entered by a CPU, will
   be over very soon (faster than the context switches involved
   in blocking a waiting thread on a turnstile so that some other
   thread could run, and then waking it up).

   Hence the idea that a thread should spin if the mutex is
   held by a thread currently on a CPU, and block otherwise.

2. Most kernel mutexes in the kernel can be adaptive (i.e., can block
   without causing scheduling trouble), are not hotly contested, and
   therefore most threads in most cases will find a mutex BOTH
   *adaptive* and *not taken*.

   Hence this path -- adaptive && not taken -- and the mutex imlementation
   data structure (see mutex_impl.c union) are *co-optimized* in
   assembly to a single lock-ed (atomic) instruction and return:

   http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/intel/ia32/ml/lock_prim.s#512  -- comment

   http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/intel/ia32/ml/lock_prim.s#558  -- code 

   Note: according to AMD 64 calling conventions, RDI holds the first
         argument to a function, which for mutex_enter is the pointer
	 to the mutex data structure from mutex_impl.c

Question: what happens if pre-emption hits on this path? 

----

We were looking at the usage and allocation of turnstiles.
Turnstiles, just like many objects of the same type _and_ size
are managed by a Kmem "slab cache"-type allocator
(for the background:
  http://www.usenix.org/publications/library/proceedings/bos94/full_papers/bonwick.ps -- original paper, bonwick-slab-allocator-usenix.pdf local copy
  http://www.ibm.com/developerworks/linux/library/l-linux-slab-allocator/ -- in Linux)

Things to spot: 

1. static data structures allocated on boot, such as the 
   first process' proc struct  p0  are found in 
   http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/conf/param.c

2. Hash table structures for collections of objects -- but not objects
   themselsves -- such as the hash array and collision lists
   ("chains") are initialized at boot time (e.g., kern_setup1):

  http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/intel/ia32/os/sundep.c#197

   which in turn calls thread_init:

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/disp/thread.c#166

   In which we find the creations of the turnstile_cache (l. 209) and
   then the allocation of the first turnstile object in the all-ancestor
   thread (l. 233). This is boot-time run-once code.

   Finally, in the same file in thread_create new  turnstiles are allocated
   (l. 345).

---

Turnstiles are acquired with turnstile_lookup *with the associated
mutex's pointer used as a hash key* and released  in turnstile_exit,
which at the same time acquire and release the high priority
dispatcher lock over the hash table's collision list:

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/os/turnstile.c#turnstile_lookup

These two functions are used as "balanced parentheses" in the code
(e.g., mutex_vector_enter) and must be balanced on each code path.

The innards of the hash table:
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/os/turnstile.c#160 -- 174. Also see the comment above it:

/*         In general, when two turnstile locks
    156  * must be held at the same time, the lock order must be the address order.
    157  * Therefore, to prevent deadlock in turnstile_pi_waive(), we must ensure
    158  * that upimutextab[] locks *always* hash to lower addresses than any
    159  * other locks.  You think this is cheesy?  Let's see you do better.
    160  */