============= Legacy BSD kernel memory allocator ============= Before we start on OpenSolaris' Vmem allocator, it will be instructive to look at the legacy BSD and the comparatively recent Linux kernel memory allocator interfaces. The legacy BSD "malloc" code has been made famous by the SCO case (in which SCO laid a claim to "intellectual property" in the Linux kernel): The story: http://www.lemis.com/grog/SCO/code-comparison.html The code: /* * Allocate 'size' units from the given * map. Return the base of the allocated space. * In a map, the addresses are increasing and the list is terminated by a 0 size. * The core map unit is 64 bytes; the swap map unit is 512 bytes. * Algorithm is first-fit. */ malloc(mp, size) struct map *mp; { register unsigned int a; register struct map *bp; for(bp=mp;bp->m_size && ((bp-mp) < MAPSIZ);bp++) { if (bp->m_size >= size) { a = bp->m_addr; bp->m_addr += size; if ((bp->m_size -= size) == 0) { do { bp++; (bp-1)->m_addr = bp->m_addr; } while ((bp-1)->m_size = bp->m_size); } return(a); } } return(0); } The sequence of "struct map"s traversed by incrementing bp acts as a free list in which the first chunk of size greater or equal than requested is found. Note that the "struct map" pointed by bp can be allocated both "in-band" or in a separate memory area. OpenSolaris chooses to allocate similar structures out-of-band, as explained in 11.3.4.1 ============= Linux generic kernel memory allocator API ============= Linux malloc with in-band boundary tags is explained in http://www.dent.med.uni-muenchen.de/~wmglo/malloc-slides.html The 2.6 Linux kernel memory allocator API is described here: http://www.linuxjournal.com/article/6930 Note that kmalloc() is a function shared by all of the kernel's non-slab allocations (slabs are handled differently and are closer to the OpenSolaris KMEM allocator (Ch. 11.2) without the extra "magazine" and "depot" layers. Flags from Table 4 in the above link determine whether this particular allocation can or cannot block, and also distinguish between several purposes of allocated memory. ============= OpenSolaris kernel's VMEM allocators ============= By contrast, OpenSolaris interfaces allow multiple names pools of memory with uniform properties per pool. Essentially, a pool becomes a named object, in which allocations and deallocation functions become methods. Pools can be nested and configured to obtain new allocations from an enclosing pool object when necessary. The textbook stresses the generalized character of the VMEM allocator in Ch. 11.3, pp. 552--553. As described, VMEM allocates subranges of integers of requested size within the initial range allocated at system boot. The integers are primarily meant to be address ranges (in particular, nested), but can also be integer ID ranges. This is stressed by calling the allocated ranges "resources", not "addresses". Although the allocator includes some special functions that are address-aware (vmem_xalloc, in particular, controls address range "coloring" as in 10.2.7), they try to be as forgetful about the nature of the ranges as possible, and treat the allocation as a general algorithmic problem about allocating integer intervals economically. The initial range is ultimately derived either from the static per-platform kernel memory layout as in http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/i86pc/os/startup.c#358 -- 457 or from a fixed permissible range of IDs. Page 554 summarizes the VMEM interface, explained in pp. 555-560. Read it before looking at the actual code. ============= VMEM code walk-through ============= In the boot-time kernelheap_init() observe the calls to vmem_init() and then a series of calls to vmem_create() creating individual VMEM pools objects of type "struct vmem": http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/vm/seg_kmem.c#212 The pools are called "arenas" (arenas are created from ranges of consecutive integers/addresses). The vmem_init() function in http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/os/vmem.c#1707 shows a very interesting feature of VMEM objects created by vmem_create() (in OO terms, a constructor for instances of VMEM objects): they are nested. At line 1723, "heap" is created from "heap_start" and "heap_size" which are static and platform-specific. At 1728, "heap" is used as a source by the vmem_metadata_arena embedded in heap (both as an object and an integer range). Note that start and base members of vmem_metadata_arena are set to NULL and 0, because they are to be determined dynamically by the "vmem_alloc" function call on "heap". Then vmem_seg_arena and vmem_hash_arena are similarly nested within vmem_metadata_arena , except their ranged will be obtained from vmem_metadata_arena by calling another function, "heap_alloc". This trick where the function is provided the arguments (or other environment) for subsequent calls is known in Programming Languages as "closure". They are the mainstay of dynamic interpreted languages. See http://www.perl.com/doc/FAQs/FAQ/oldfaq-html/Q3.14.html about closures. In the words of the Perl creator Larry Wall, "This is a notion out of the Lisp world that says if you define an anonymous function in a particular lexical context, it pretends to run in that context even when it's called outside of the context. In human terms, it's a funny way of passing arguments to a subroutine when you define it as well as when you call it. It's useful for setting up little bits of code to run later, such as callbacks." The combination *afunc , source in the vmem_create closely resembles a closure (pun intended). The vmem_create() function http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/os/vmem.c#1416 that creates new named and possibly nested vmem objects looks like a typical constructor that first allocates a chunk of RAM (vmp) then fills it in. Note that some of the arenas (VMEM_INITIAL) are pre-allocated at boot as members of vmem0[] array (cf. line 1443). Freelists in a vmem instance correspond to power of 2 (see Ch. 11.3.4.2, p. 558). Observe their initialization in lines 1464-1473. The hash table for allocated "resources" and their sizes in initialized immediately after. Definition of struct vmem : http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/sys/vmem_impl.h#113 Observe the logic of vmem_alloc() as it accommodates different types of allocations encoded by flags: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/os/vmem.c#1236 Note: VM_NEXTFIT ignored the freelists entirely and just gets the next free ID. "flist" is the index of the appropriate freelist corresponding to the requested size in the freelist array. Recall that highbit() of an integer value returns its highest set bit + 1 . Note that line 1544 is the special case for vmem allocators nested inside others, as shown in vmem_init() example above (with initial NULL for base address/resource and 0 for length ) . vmem_add() will cause the "closure-like" allocation function *afunc to be called on source to grab the initial interval boundaries. Finally, observe the use of the VMEM instance "ptms_minor" to manage the ID range: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/io/ptms_conf.c#281 In particular , the "base address" is (void *1), overall size ptms_nslots and quantum is 1. The cast of the resource integer to (void*) prevents type conversion warnings.