==================== Kernel's Memory Space ====================

For the starting point of creating the kernel memory image: 
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/i86pc/os/startup.c#670

Also peek at the constant definitions setting the platform limit of
linear addresses, starting at line 189. These will be used to compute
everything else in the address layout (see line 383 for an example 
layout).

Note also lines 289--363 for the important kernel symbols: these
variables get created here, will be referred to throughout other code
as 'extern' declarations.

(Notice that this is an x86 platform-specific startup; it's got to
start platform-specific till higher level abstractions like Vmem can
be used. Note that such abstractions still have to work around the 
so-called "memory hole", a range of 64bit addresses that cannot be 
used by most systems. You will find checks for memory hole even
in high-level objects like AS: e.g., seg_alloc, the function that builds
the segments of an address space, 
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/vm/vm_seg.c#seg_alloc
calls valid_va_range:
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/i86pc/vm/vm_machdep.c#valid_va_range
)

The kernel is loaded into a set of fixed, platform-specific virtual address
ranges. To re-iterate, all addresses are virtual, and so is any address that
is encoded as a part of an instruction.

Kernel (virtual) address layouts:
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/i86pc/os/startup.c#380

Within these kernel address ranges, address mappings are created and managed 
by several kernel "segment drivers":

Kernel's segments (Ch. 11.1):

seg_kmem     -- normal non-pageable kernel memory management
 
seg_kp	     -- kernel pageable memory management  

seg_map        -- file cache pages mapped into kernel space  (::addr2smap DCMD)

seg_kpm        -- all physical memory pages mapped into kernel space (only in x64)

Read more about these in 11.1.5-6 .

The Smap layer establishes a mapping from a virtual address to 
a "page identity" <sm_vp, sm_off>: 
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/vm/seg_map.h#75
In MDB, it is reported by the ::addr2smap  command.

An important observation about "large pages" is found on p. 530, 11.1.2 regarding
the effect of the large pages on the kernel's efficiency. 10% improvement is a lot.

Suggestion: examine 'kas' in kernel space, draw the full tree of kernel segments. 
Observe their different *_ops arrays, compare with the segment drivers above.


=========== AVL trees, embedding & offsets ===========

Address spaces ('struct as') of processes ('proc_t') -- as well as of
the kernel's address space 'kas' -- are made up of segments ('struct
seg'). These segments are arranged in an AVL tree (a balanced kind of
a binary search tree) to make finding a segment for a faulting address
efficient; see the definitions for 'avl_tree_t', 'avl_node_t', and
their containing 'struct as' and 'struct seg' respectively.

Note that the tree and node structure are contained rather than
pointed to in the respective OS abstractions -- note the offset
manipulation used for translation between embedded nodes and their
containing data structures.

http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/vm/vm_as.c#as_findseg
 (note caching of the last looked up segment)

hands off to
http://src.illumos.org/source/xref/illumos-gate/usr/src/common/avl/avl.c#avl_find
 (essentially, a simple binary search)

For every kernel-space data structure, "how is access to it is
synchronized?" is essential (one cannot write any kernel code using a
data structure without understanding the sync model of the data
structure). So comment on line 26 explains the synch
http://src.illumos.org/source/xref/illumos-gate/usr/src/common/avl/avl.c#26

---- Kernel AS (kas) example ----

kas::print
{
    a_contents = {
        _opaque = [ 0 ]
    }
    a_flags = 0
    a_vbits = 0
    a_cv = {
        _opaque = 0
    }
    a_hat = 0xffffff0149476e78
    a_hrm = 0
    a_userlimit = 0
    a_seglast = kmapseg
    a_lock = {
        _opaque = [ 0 ]
    }
    a_size = 0xff3814b000
    a_lastgap = 0
    a_lastgaphl = 0
    a_segtree = {
        avl_root = kvseg+0x20
        avl_compar = as_segcompar
        avl_offset = 0x20
        avl_numnodes = 0x9            
        avl_size = 0x60               
    }                                 
...

Observe 9 segments in the kernel address space. At the root
of the kernel AS tree is kvseg seg_t structure.

> kvseg::print                        
{
    s_base = 0xffffff0149400000
    s_size = 0xfe76c00000
    s_szc = 0
    s_flags = 0
    s_as = kas
    s_tree = {
        avl_child = [ kpseg+0x20, ktextseg+0x20 ]
        avl_pcb = 0
    }
    s_ops = segkmem_ops
    s_data = kvps

...

Abbreviated:

> kvseg::seg
             SEG             BASE             SIZE             DATA OPS
fffffffffbc31530 ffffff0149400000       fe76c00000 fffffffffbceea30 segkmem_ops

From the child nodes kpseg and ktextseg you can explore the full AVL tree
of the kernel address space. [Do it! Note the different *_ops and 
s_data structs for segments -- their interplay makes up the "segment
drivers" described in the textbook.]

Observe the switching between the avl_node_t embedded into the
respective seg_t structs (at 0x20, the a_segtree's avl_offset)
and the actual seg_t objects. In avl.c it is provided by macros
AVL_NODE2DATA and AVL_DATA2NODE. The implicit assumption is that
in any avl_tree_t, the offsets involved are the same for all tree
nodes; the same holds for the comparator function applied to
tree nodes in avl.c. 


============= Legacy BSD kernel memory allocator =============

Before we start on OpenSolaris' Vmem allocator, it will be instructive
to look at the legacy BSD and the comparatively recent Linux
kernel memory allocator interfaces.

The legacy BSD "malloc" code has been made famous by the SCO case
(in which SCO laid a claim to "intellectual property" in the Linux kernel):

The story: 
      http://www.lemis.com/grog/SCO/code-comparison.html

The code:

/*
 * Allocate 'size' units from the given
 * map.  Return the base of the allocated space.
 * In a map, the addresses are increasing and the list is terminated by a 0 size.
 * The core map unit is 64 bytes; the swap map unit is 512 bytes.
 * Algorithm is first-fit.
 */
malloc(mp, size)
struct map *mp;
{
        register unsigned int a;
        register struct map *bp;

        for(bp=mp;bp->m_size && ((bp-mp) < MAPSIZ);bp++) {
                if (bp->m_size >= size) {
                        a = bp->m_addr;
                        bp->m_addr += size;
                        if ((bp->m_size -= size) == 0) {
                                do {
                                        bp++;
                                        (bp-1)->m_addr = bp->m_addr;
                                } while ((bp-1)->m_size = bp->m_size);
                        }
                        return(a);
                }
        }
        return(0);
}

The sequence of "struct map"s traversed by incrementing bp acts as a
free list in which the first chunk of size greater or equal than
requested is found.
 
Note that the "struct map" pointed by  bp  can be allocated both "in-band"
or in a separate memory area. OpenSolaris chooses to allocate similar structures
out-of-band, as explained in 11.3.4.1

============= Linux generic kernel memory allocator API =============

Linux malloc with in-band boundary tags is explained in 

    http://www.dent.med.uni-muenchen.de/~wmglo/malloc-slides.html

The 2.6 Linux kernel memory allocator API is described here:

                  http://www.linuxjournal.com/article/6930

Note that kmalloc() is a function shared by all of the kernel's non-slab
allocations (slabs are handled differently and are closer to the OpenSolaris
KMEM allocator (Ch. 11.2) without the extra "magazine" and "depot" layers.

Flags from Table 4 in the above link determine whether this particular allocation
can or cannot block, and also distinguish between several purposes of allocated
memory. 

============= OpenSolaris kernel's VMEM allocators =============

By contrast, OpenSolaris interfaces allow multiple named pools of
memory with uniform properties per pool ("Slabs" aka "Kmem caches";
Vmem "arenas"). Essentially, a pool becomes a named object, in which
allocations and deallocation functions become methods. Pools can be
nested and configured to obtain new allocations from an enclosing pool
object when necessary. 

The textbook stresses the generalized character of the VMEM allocator in
Ch. 11.3, pp. 552--553. As described, VMEM allocates subranges 
of integers of requested size within the initial range allocated
at system boot. The integers are primarily meant to be address
ranges (in particular, nested), but can also be integer ID ranges.

This is stressed by calling the allocated ranges "resources", not
"addresses". Although the allocator includes some special
functions that are address-aware (vmem_xalloc, in particular,
controls address range "coloring" as in 10.2.7), they try to
be as forgetful about the nature of the ranges as possible, and
treat the allocation as a general algorithmic problem about
allocating integer intervals economically.

The initial range is ultimately derived either from the static
per-platform kernel memory layout as in
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/i86pc/os/startup.c#383
or from a fixed permissible range of IDs.

Page 554 summarizes the VMEM interface, explained in pp. 555-560.
Read it before we start looking at the actual Vmem code.