Assigned reading: bonwick01.pdf in the course directory. ========== OpenSolaris Virtual Memory Design Overview ========== The textbook covers different aspects of the VM system in Chapters 9--12. The Solaris kernel developers have introduced several important new ideas into how memory was managed. Fundamental papers include "bonwick94" and "bonwick01". Please make sure you read the latter! We will look at OpenSolaris memory management both from OS abstractions (address space objects, Vmem resource allocation, segment/VFS mappings) downward to the hardware mechanisms (page tables, HAT layer, physical page identity as a ) and vice versa. In particular: 1. From a virtual address space as seen by a process or the kernel context (cf. "kas", the kernel's global "struct as" for the kernel's virtual address space, as in Fig. 11.2 p. 533): proc_t.p_as -> "struct as" -> an AVL tree of "struct seg" "struct seg"s are managed by "segment drivers", of which the seg_vn driver does the work of mmap-ing file and anonymous memory allocations. These segment drivers operate on driver-specific "s_data" members. Seg_vn's s_data format is "struct segvn_data". Observe that the "struct seg" contains both pointers to driver-specific data "s_data" *and* driver-specific functions that operate on this data (and thus know its format, and do the right thing), pointed to by "s_ops". In OO terms, these are "instance members" and "methods" of the segment driver class. Moreover, each segment driver such as seg_vn could be viewed as a derived class of the abstract class "seg", with s_ops as its "virtual methods" (whereas functions acting on "seg"s would be non-virtual). For seg_vn mappings of *files*, the file (more precisely, its vnode) is located through seg.s_data -> segvn_data.vp and the offset in this file is segvn_data.offset -- Fig. 9.10 p. 483 For seg_vn mappings of *anonymous* memory (described in Ch. 9.6-7) the picture is much more complicated, because of the need to keep (a) the information about the logical extent of the allocated memory chunks -- cf. "struct anon" for each page chunk, "struct anon_map" for sharing whole anonymous segments between application processes, and everything in-between. (b) the info about its location in the "swapfs" (see Ch. 9.8) where is could be swapped out, and (c) the uniform scheme of unique "identity" for every allocated physical page, i.e., the ability to locate that physical page uniquely by its unique pair of associated with it via "page_hash". Note: If the "identity" vnode is not present naturally, it is created specifically for the purpose of identifying and referencing a group of physical pages. See Fig. 9.11 p. 486. Thus we get from virtual addresses and semantically different contiguous areas of virtual memory (.text vs .data vs heap vs stack, etc. all have different purposes and need to be treated differently by the OS according to their intended functionality) to vnodes and offsets. This can be considered an expression of the fundamental UNIX philosophy that "everything is a file" (more precisely, that the resources programs act on are all accessible through a system-global namespace with "paths" and "filenames", and allow a universal set of operations such a "open", "read", "write", "seek" and so on that treat the resource as a stream of bytes, plus some special operations performed via the "ioctl" interface). identity of memory segments/regions/pages is where the virtual space-side view of memory meets the physical page-side view that connects the hardware memory management with OS abstractions. ==================== Kernel's Memory Space ==================== The kernel is loaded into a set of fixed, platform-specific virtual address ranges. To re-iterate, all addresses are virtual, and so is any address that is encoded as a part of an instruction. Kernel (virtual) address layouts: http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/i86pc/os/startup.c lines 350--530 Global variables/symbols reflecting that layout: http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/vm/seg_kmem.h lines 45-70 (actual definitions in http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/vm/seg_kmem.c Within these kernel address ranges, address mappings are created and managed by several kernel "segment drivers": Kernel's segments (Ch. 11.1): seg_kmem -- normal non-pageable kernel memory management seg_kp -- kernel pageable memory management seg_nf -- ??? seg_map -- file cache pages mapped into kernel space (::addr2smap DCMD) seg_kpm -- all physical memory pages mapped into kernel space (only in x64) Read more about these in 11.1.5-6 . ==================== KMem Caches ==================== The two most interesting features of the Kmem caches design are 1. Avoidance of global structures that would need to be locked for every allocation from the slab (like a global freelist). Instead, per-CPU "magazines" are used. 2. Object-oriented style of a cache: constructor, destructor and reclaim method pointers are arguments to the kmem_cache_create() and will be called automatically when objects are allocated and freed. (see "bonwick01" for more details). In "design patterns" term this makes a named cache a "factory" of objects of that type. Ch 11.2.3.2 explains the theory of "caching", 11.2.3.1 gives the background on the design. I found the explanations of 11.2.3.4--7 somewhat confusing. Keep in mind that despite all the military terminology, we are merely dealing with a list of finite-depth stacks. The "magazines", "rounds", etc. are best understood from the logic of kmem_cache_allocate(): http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/os/kmem.c (read the comment from the top down -- it shares parts with Bonwick's Vmem paper.) Note the *per-PCU shift* on line XXXX, by the KMEM_CPU_CACHE macro from: http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/sys/kmem_impl.h#200 The "magazines" themselves are just arrays (stacks) of pointers into the main slab structure, linked in a single list. Note that their depth is not statically set in the definition (it is dynamically adjusted based on contention metrics). If the "magazine" and "depot" logic falls through (to line XXXX) , the "slab" layer will take the global cache lock (and cause threads on all other CPUs to block in this critical section). This lock is taken on line XXXX for the whole cache, not just for a per-CPU part. Finally, note the constructor call (if defined for the "class" of managed object) on line XXXX. This is code typical of an OO-language interpreter (e.g., Ruby implementation).