Assigned reading: bonwick01.pdf in the course directory.

==========  OpenSolaris Virtual Memory Design Overview ==========

The textbook covers different aspects of the VM system in 
Chapters 9--12. 

The Solaris kernel developers have introduced several important
new ideas into how memory was managed. Fundamental papers
include "bonwick94" and "bonwick01". Please make sure you
read the latter!

We will look at OpenSolaris memory management both from OS abstractions
(address space objects, Vmem resource allocation, segment/VFS mappings) 
downward to the hardware mechanisms (page tables, HAT layer, physical page 
identity as a <vnode, offset>) and vice versa. In particular:

1. From a virtual address space as seen by a process
    or the kernel context (cf. "kas", the kernel's global "struct as"
    for the kernel's virtual address space, as in Fig. 11.2 p. 533):  

    proc_t.p_as -> "struct as" -> an AVL tree of "struct seg"

      "struct seg"s are managed by "segment drivers", of which
        the seg_vn driver does the work of mmap-ing file and
	anonymous memory allocations.

       These segment drivers operate on driver-specific "s_data"
       members. Seg_vn's s_data format is "struct segvn_data".

     Observe that the "struct seg" contains both pointers to
       driver-specific data "s_data" *and* driver-specific functions
       that operate on this data (and thus know its format,
       and do the right thing), pointed to by "s_ops". 

     In OO terms, these are "instance members" and "methods" 
       of the segment driver class. Moreover, each segment driver
       such as seg_vn could be viewed as a derived class of the
       abstract class "seg", with s_ops as its "virtual methods"
       (whereas functions acting on "seg"s would be non-virtual).

    For seg_vn mappings of *files*, the file (more precisely, its vnode) 
    is located through

    seg.s_data -> segvn_data.vp   
   
       and the offset in this file is segvn_data.offset -- Fig. 9.10 p. 483

    For seg_vn mappings of *anonymous* memory (described in Ch. 9.6-7) 
      the picture is much more complicated, because of the need to keep
      
      (a) the information about the logical extent of the allocated memory 
           chunks -- cf. "struct anon" for each page chunk, 
	   "struct anon_map" for sharing whole anonymous segments
	   between application processes, and everything in-between. 

     (b) the info about its location in the "swapfs" (see Ch. 9.8)
          where is could be swapped out, and 

     (c) the uniform scheme of unique <vnode, offset> "identity" 
     	  for every allocated physical page, i.e., the ability to
      	  locate that physical page uniquely by its unique pair of 
	  <vnode, offset> associated with it via "page_hash".

      Note: If the "identity" vnode is not present naturally, 
      	    it is created specifically for the purpose of identifying
	    and referencing a group of physical pages. 

    See Fig. 9.11 p. 486.

    Thus we get from virtual addresses and semantically different
contiguous areas of virtual memory (.text vs .data vs heap vs stack,
etc. all have different purposes and need to be treated differently by
the OS according to their intended functionality) to vnodes and offsets.

    This can be considered an expression of the fundamental UNIX
philosophy that "everything is a file" (more precisely, that the
resources programs act on are all accessible through a system-global
namespace with "paths" and "filenames", and allow a universal set of
operations such a "open", "read", "write", "seek" and so on that treat
the resource as a stream of bytes, plus some special operations
performed via the "ioctl" interface).

    <vnode,offset> identity of memory segments/regions/pages is where
the virtual space-side view of memory meets the physical page-side
view that connects the hardware memory management with OS abstractions.


==================== Kernel's Memory Space ====================

The kernel is loaded into a set of fixed, platform-specific virtual address
ranges. To re-iterate, all addresses are virtual, and so is any address that
is encoded as a part of an instruction.

Kernel (virtual) address layouts:
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/i86pc/os/startup.c
lines 350--530

Global variables/symbols reflecting that layout:
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/vm/seg_kmem.h
lines 45-70

(actual definitions in 
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/vm/seg_kmem.c

Within these kernel address ranges, address mappings are created and managed 
by several kernel "segment drivers":

Kernel's segments (Ch. 11.1):

seg_kmem     -- normal non-pageable kernel memory management
 
seg_kp	     -- kernel pageable memory management  

seg_nf 	     -- ???

seg_map        -- file cache pages mapped into kernel space  (::addr2smap DCMD)

seg_kpm        -- all physical memory pages mapped into kernel space (only in x64)

Read more about these in 11.1.5-6 .

==================== KMem Caches ====================

The two most interesting features of the Kmem caches design are

1. Avoidance of global structures that would need to be locked for every
    allocation from the slab (like a global freelist). Instead, per-CPU
    "magazines" are used.

2. Object-oriented style of a cache: constructor, destructor and reclaim
    method pointers are arguments to the kmem_cache_create() and
    will be called automatically when objects are allocated and freed.

(see "bonwick01" for more details).

In "design patterns" term this makes a named cache a "factory" of objects
of that type.

Ch 11.2.3.2 explains the theory of "caching", 11.2.3.1 gives the background
on the design.

I found the explanations of 11.2.3.4--7  somewhat confusing. Keep in mind
that despite all the military terminology, we are merely dealing with a list 
of finite-depth stacks.

The "magazines", "rounds", etc. are best understood from the logic of
kmem_cache_allocate():

http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/os/kmem.c
(read the comment from the top down -- it shares parts with Bonwick's Vmem
paper.)

Note the *per-PCU shift* on line XXXX, by the KMEM_CPU_CACHE macro from:
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/sys/kmem_impl.h#200

The "magazines" themselves are just arrays (stacks) of pointers into the main
slab structure, linked in a single list. Note that their depth is not statically
set in the definition (it is dynamically adjusted based on contention metrics).

If the "magazine" and "depot" logic falls through (to line XXXX) , the "slab" layer 
will take the global cache lock (and cause threads on all other CPUs to block 
in this critical section). This lock is taken on line XXXX for the whole cache,
not just for a per-CPU part. 

Finally, note the constructor call (if defined for the "class" of
managed object) on line XXXX. This is code typical of an OO-language
interpreter (e.g., Ruby implementation).