============== File System Architecture ============== Ch 14 of the textbook starts with the general theory of the Solaris file system design. Linux is similar in many respects, and adopted the VFS design years ago. However, algorithmic optimizations for massive multi-threading (e.g., file descriptor allocation) are still largely Solaris-specific. Consider the VFS design from several angles, per layer: Thread many concurrent threads may want to call open() on different files (think multi-threaded webserver) Process file descriptor number and file pointer (current position for reads or writes) are per-process VFS unique vnode for each open file file's pathname gets resolved to the unique vnode (created if not open-ed and created by some other process yet) resolved path components are cached in NDLC layer FS impl. unique inode for each file. inode's methods do the device-dependent work. VOP_* methods dispatch to actual FS and device driver methods that know how to treat the file representations on the device. Ch 14.2--14.2.2 describes the chain of structures through these layers from the proc_t to vnode (see Fig. 14.2). In class, we observed these data structures as follows: 1. With a DTrace script, "vim /etc/passwd" got suspended on return from read(). We could also suspend it on entry to read(), but I wanted to see the file pointer position on return from read() . It was at the end of file, predictably, from such a small file. The script: suspend-on-file-read.d (in the course dir) See also: http://blogs.sun.com/LetTheSunShineIn/entry/dtrace_suspend_a_process_when 2. Then we found the proc_t struct by pid and then the file descriptor list fi_list and the open file's entry. Useful command: "pgrep vim" to get vim's PID. Say, 2822 In "mdb -k": > 0t2822::pid2proc -- pid to address of proc descriptor block > 0xe0a8b8d8::print -t proc_t -- print the whole proc_t for this process (many pages) > 0xe0a8b8d8::print -t proc_t !grep fi_ -- filter the output to file descriptor-related members > 0xe0a8b8d8::pfiles -- print a representation of the file desc table > 0xe0a8b8d8::pfiles -pf -- print hex addresses of the file desc table elements Also: ::print -t uf_entry_t for pointers to elements of fi_list . Each uf_entry_t in turn points to a "struct file" , see p. 660. "struct file" then contains the vnode pointer and the file offset. Suggestion: examine all of these structures, starting from proc_t . Look for the fi_list array in proc_t.p_user.u_finfo, then for "struct file" members under uf_entry_t . See also pp. 666-668. You should be able to locate the vnode for /etc/passwd (Hint: !grep path) and the offset after /etc/passwd got read. See 14.2.5 for a step-by-step MDB exploration commands. ============== Multi-thread optimization for FD lookup ============== In order to minimize contention on the thread level, allocating the next free file descriptor (FD) is handled in an elaborately unique way. Note that POSIX systems must return the lowest available FD at the moment of the open() call. In order to avoid a global counter lock, infix binary trees are used, as described in 14.2.3. Read it -- it's a neat algorithm! Then try to follow it in the code (fd_find, fd_reserve in the Figure in 14.2.2). Good luck following all the bit-level macros :-) ============== VFS descriptor structures ============== Ch. 14.4 describes how the file system kernel modules are written. In particular, Ch. 14.2.2 shows the linkage of data structures that the kernel expects. Suggestion: follow this through in http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/tmpfs/tmp_vfsops.c and for the rest of tmpfs structures in http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/tmpfs/ _init is the execution entry point for any module mod_install is the kernel's linker-loader interface, which gets the description of what's to link. Ch. 14.5 describes the VFS interface (struct vfs is pointed to by any vnode), and 14.5.6 specifically shows the structures observable from the MDB -k . ============== Vnode recap ============== See Ch. 14.4 -- 14.6.4 and follow the discussion throughout the code. Observe that http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/vnode.c#3069 is the *essence* of the object-oriented vnode design: the right function, aliased by a C macro to VOP_*, is called through the object instance pointer and the class method table v_ops. So it is with all the other functions, which, in their turn, will redispatch to the physical FS inode's methods. ============== Registering FS methods with the kernel ============== This is the standard kernel interface described in 14.5. Example for tmpfs: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/tmpfs/tmp_vfsops.c#189 , and http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/tmpfs/tmp_vfsops.c#195 Consider also the logic of tmp_mount, line 230. ============== Vmem heap debugging ============== Filler patterns for RAM: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/sys/kmem_impl.h#60 -- 82 Ch. 11.4.3 summarizes the four most frequent errors that occur when using the heap; the filler patterns are designed to catch these four conditions. Explanations of each pattern's use: Ch. 11.4, pp. 562--572.