=====[ From Doug Lea's malloc() to Vmem ]===== Once a virtual address space is subdivided into regions by intended use (code, stack, heap, etc.), heaps must be managed on a more granular level of _chunks_ or _objects_. Chunks used to be the only way memory allocation worked: when needed, a program (or kernel) called malloc() to allocate a chunk of the size given in argument, and called free() when ready to release that memory chunk back into the heap. Care should be avoid memory fragmentation: imagine that you fill a 2M heap with lots of 16-byte structs in a row, then release every second one such struct. Even though you'll have 1M free worth of bytes, you won't be able to allocate any structure larger than 16 bytes unless you have a way of moving around the structs still in use without breaking pointers pointing to them. This is still the best one can do when we don't know how much memory will be needed next. However, an OS knows that it will need a bunch of proc_t, vnode_t, kthread_t, and other structures of known sizes. So instead of mixing chunks of these known and frequently used sizes in a single heap, we can allocate them in dedicated heaps, and know that the next struct of a given size will always fit if free slots are available (and if not, we'll grab another dedicated page). This method of allocation is called _slab_ allocation (the slab is the page where structs of the same size and layout lie back to back). In OpenSolaris/Illumos and Linux the allocator that handles slabs (also called "object caches") is called the KMEM allocator. This method can also apply to any application that creates many instances of a particular struct. We will first review the history of heaps, then examine the VMEM allocator that improved on KMEM. ============= Legacy BSD kernel memory allocator ============= Before we start on OpenSolaris' Vmem allocator, it will be instructive to look at the legacy BSD and the comparatively recent Linux kernel memory allocator interfaces. The legacy BSD "malloc" code has been made famous by the SCO case (in which SCO laid a claim to "intellectual property" in the Linux kernel): The story: http://www.lemis.com/grog/SCO/code-comparison.html The code: /* * Allocate 'size' units from the given * map. Return the base of the allocated space. * In a map, the addresses are increasing and the list is terminated by a 0 size. * The core map unit is 64 bytes; the swap map unit is 512 bytes. * Algorithm is first-fit. */ malloc(mp, size) struct map *mp; { register unsigned int a; register struct map *bp; for(bp=mp;bp->m_size && ((bp-mp) < MAPSIZ);bp++) { if (bp->m_size >= size) { a = bp->m_addr; bp->m_addr += size; if ((bp->m_size -= size) == 0) { do { bp++; (bp-1)->m_addr = bp->m_addr; } while ((bp-1)->m_size = bp->m_size); } return(a); } } return(0); } The sequence of "struct map"s traversed by incrementing bp acts as a free list in which the first chunk of size greater or equal than requested is found. Note that the "struct map" pointed by bp can be allocated both "in-band" or in a separate memory area. OpenSolaris chooses to allocate similar structures out-of-band, as explained in 11.3.4.1 To understand 'double free' attacks on the malloc-managed heaps, read: http://www.phrack.com/issues.html?issue=57&id=9 http://www.phrack.org/issues.html?issue=61&id=6 ============= Linux generic kernel memory allocator API ============= Linux malloc with in-band boundary tags is explained in http://www.dent.med.uni-muenchen.de/~wmglo/malloc-slides.html The 2.6 Linux kernel memory allocator API is described here: http://www.linuxjournal.com/article/6930 Note that kmalloc() is a function shared by all of the kernel's non-slab allocations (slabs are handled differently and are closer to the OpenSolaris KMEM allocator (Ch. 11.2) without the extra "magazine" and "depot" layers. Flags from Table 4 in the above link determine whether this particular allocation can or cannot block, and also distinguish between several purposes of allocated memory. ============= OpenSolaris kernel's VMEM allocators ============= By contrast, OpenSolaris interfaces allow multiple named pools of memory with uniform properties per pool ("Slabs" aka "Kmem caches"; Vmem "arenas"). Essentially, a pool becomes a named object, in which allocations and deallocation functions become methods. Pools can be nested and configured to obtain new allocations from an enclosing pool object when necessary. The textbook stresses the generalized character of the VMEM allocator in Ch. 11.3, pp. 552--553. As described, VMEM allocates subranges of integers of requested size within the initial range allocated at system boot. The integers are primarily meant to be address ranges (in particular, nested), but can also be integer ID ranges. Read the bonwick01.pdf and bonwick94.pdf papers in the class directory (they overlap with the textbook text, but make some points better). This is stressed by calling the allocated ranges "resources", not "addresses". Although the allocator includes some special functions that are address-aware (vmem_xalloc, in particular, controls address range "coloring" as in 10.2.7), they try to be as forgetful about the nature of the ranges as possible, and treat the allocation as a general algorithmic problem about allocating integer intervals economically. The initial range is ultimately derived either from the static per-platform kernel memory layout as in http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/i86pc/os/startup.c#383 or from a fixed permissible range of IDs. Page 554 summarizes the VMEM interface, explained in pp. 555-560. Read it before we start looking at the actual Vmem code. To see how frequently & broadly this mechanism is used, search Illumos for vmem_init(), vmem_create(), and kmem_cache_create(). Note the hierarchical structure of the Vmem and Kmem objects being created; check that it corresponds to what you see with the ::vmem MDB command. Core kernel memory allocation happens in http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/vm/seg_kmem.c . Read the top comment about the seg_kmem driver and the parts of the kernel heap, and follow the creation of the vmem_t objects declared at lines 103--122 inside kernelheap_init(). Also note the methods table (seg_ops) for this driver being declared & defined in seg_kmem.c at line 776. These methods working in concert with each other _are_ the driver. Note how seg->s_data is treated thoughout these methods! (If you wonder what "kvp" is and why you cannot find this symbol in MDB, look at http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/vm/seg_kmem.h#70 and use the actual symbol "kvps" instead). There are some vmem arenas created also in http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/i86pc/os/startup.c (look for vmem_create()). ----[ Walkers and commands ]---- A trick to search for relevant commands in MDB is to grep the output of ::dcmds and ::dmods -l commands, say, "::dmods -l ! grep mem". You will see many "walkers" that allow a printing command to iterate over a series of related, connected structures by way of piping their results to other DCMD commands (this is different from piping-to-shell via "!" as with "! grep " above). Read about walkers in https://blogs.oracle.com/jwadams/entry/an_mdb_1_cheat_sheet, mdb-reference-chapter.pdf, or the full MDB Guide. [Note: there exist many versions of the MDB Guide: http://docs.oracle.com/cd/E18752_01/html/816-5041/ http://docs.oracle.com/cd/E19455-01/806-5194/806-5194.pdf and http://illumos.org/books/mdb/preface.html (latest, but some links damaged). ]