... We interrupt our programming to talk about ... ========== OpenSolaris pragmatics ========== So far we have been looking at OpenSolaris from the kernel programmer's point of view. Today we are going to switch to the system administrator's angle. The practice of system administration has been known to generate problems that have deep impact on programming languages (think Perl and the subsequent explosion of scripting languages such as Python and Ruby; these days a lot of production programming is done in those) and operating systems. In general, when a system administration task turns out to be both frequently needed and laborious, this suggests the need for a new OS feature or mechanism change to take care of it -- either automate it or obviate it. Some of these changes need to go surprisingly deep. OpenSolaris "boot environments" and "snapshots" provided by ZFS are a good example -- they required a revolutionary change in the primary (root) filesystem. When you break your installation, it is likely that your will encounter reactions from the following OpenSolaris subsystems: 1. Services management 2. ZFS boot environments 3. GRUB boot configuration For each of these feature, in order to understand their behavior in OpenSolaris, we look at how respective tasks are handled in Linux. ======= 1. Services (technically, daemon processes or groups thereof) ==== Debian Linux's init process reads /etc/inittab on startup and gets the "default runlevel" and "respawn" clauses ("if this process dies, restart it"). Thus init is responsible for restarting daemon/server processes that die accidentally. For example, to respawn the Linux virtual consoles: # Format: # ::: # 1:2345:respawn:/sbin/getty 38400 tty1 2:23:respawn:/sbin/getty 38400 tty2 3:23:respawn:/sbin/getty 38400 tty3 ... All scripts to start and stop server processes are in /etc/init.d/ , and they expect "start", "stop", and often also "restart", "reload" (to load new configs into a running process). The Linux "runlevel" is a collection of processes to be started in a fixed order (so that their mutual dependencies are satisfied). This order is observed as follows: the script /etc/init.d/rc uses the directory listing order to start the scripts in turn. It boils down to for s in /etc/rc$runlevel.d/S* do $s start done ---- Suggestion: read the startup script on your Linux system and see the shell scripting tricks involved. Start with "ls -l /etc/rc2.d/" . What is the role of the K* symlinks? How is concurrency handled? That is, if the system has multiple CPUs, how are they used to speed up the startup process? A Debian script /etc/init.d/rc script is available in the course dir as debian_rc_script. ---- Thus in Linux (more precisely, "System V" UNIX style) the runlevels are represented as directories of symlinks, and the symlink naming controls the order of execution and provides for correct ordering of dependencies. Restarting crashed daemons is done by init, based on executable name. This system was powerful and flexible for its time, but it it makes expressing mutual dependencies an exercise in arranging symlinks and does not express much else. OpenSolaris gets rid of all that, and describes services and their dependencies in XML files, located in /var/svc/manifest/* . Each dependency is an XML element, and refers to another XML file for the actions to performs to activate the dependency. Shell scripts are retained, but now they are called "methods", and invoked by the separate daemon, svc.startd . The scripts themselves now reside in /lib/svc/method/ . So the task of restarting crashed daemons *and* all their dependencies if need be is broken out of the init process into a separate daemon. This daemon, /lib/svc/bin/svc.startd , is one of the remaining few processes started and respawned by init itself (see /etc/inittab and "man svc.startd" page). Suggestion: read the XML manifests for ssh and other familiar daemons. Find their shell scripts, and read them too. The dependency system for the services is described in detail in http://www.sun.com/bigadmin/content/selfheal/smf-quickstart.jsp . "Predictive self-healing" here means that the system knows (from XML manifests and scripts) how to restart a daemon that failed, reloading/reinitializing as necessary any other OS components that it relies on. Linux "runlevels" in this scheme correspond to "milestones" -- both are groups of processes to be started together. For a brief summary, see http://wiki.genunix.org/wiki/index.php/OpenSolaris_Cheatsheet NOTE: Should you find yourself in "maintenance mode", disabling problem services will get you out of it. =============== ZFS and "Boot Environments" ==================== Traditional UNIX filesystems on x86 existed in a hard disk partition, primary or extended/logical. The format of the partition table is very simple and goes back to the times of DOS; an entry contains the starting sector and the number of consecutive sectors that form the partition. Initially, the disk's partition table had only 4 entries (and thus allowed up to 4 partitions per disk). Eventually it was extended, by allowing one or more of the partition entries to be marked as "extended", which meant that an additional partition table was contained at the partition's start (partitioned it contained were called "logical"). More info about PC partition tables: http://www.win.tue.nl/~aeb/partitions/partition_tables.html (see also http://en.wikipedia.org/wiki/Disk_partitioning) Then traditional systems like EXT2 or UFS placed their superblock and inode metadata at the beginning of their assigned partition. While simple, changing the disk's layout, in particular adding storage space to existing filesystem, and then growing or shrinking individual filesystems to use it, was hard (cf. Gparted LiveCD, http://gparted.sourceforge.net/livecd.php) The Logical Volume Manager tried to alleviate the problem of reconfiguring storage by providing the illusion of a physical disk to the OS (via a driver), while in fact using non-contiguous block areas from one or more actual hard drives. The driver was loaded early on boot, so that the rest of the kernel could then be loaded from the pseudo-device. For the summary see http://www.tldp.org/HOWTO/LVM-HOWTO/ and http://en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux) Although LVM solved the problem of adding physical storage, the problem of resizing partitions to take advantage of added blocks remained. In particular, filesystems had to be unmounted in order to be resized, i.e. the system had to be shut down for the duration. OpenSolaris did away with all that in ZFS. The ZFS filesystem has no "rigid" layout that matches a stretch of physically or virtually consecutive blocks. Instead ZFS allows blocks to be added to its "pools", and allows nesting of pools. BTW: This should remind you of the nesting of VMEM allocators, made possible by the OpenSolaris object-oriented style. Filesystem metadata is handled as "datasets", and ZFS allows co-existence of several datasets for the same pool. The latter is revolutionary, because it allows to keep the state of the filesystem "before" and "after" operations that can trash the system, such as "pkg image-update", and revert to the "before" state if need be. More information on snapshots and clones (read it!) http://developers.sun.com/developer/technicalArticles/opensolaris/boot-environments.html and http://dlc.sun.com/osol/docs/content/2008.11/snapupgrade/gentextid-173.html Cheat-sheet that gives some idea of ZFS capabilities: http://wiki.genunix.org/wiki/index.php/OpenSolaris_Cheatsheet#ZFS Note the nesting of pools -- this is a bit confusing to those who are used to the strictures of BIOS partitions, but is natural for object-oriented thinking: one object gets a subset of the containing object's resources, and manages them autonomously. ZFS can use either raw disks or primary partitions (you *cannot* yet install OpenSolaris in a logical/extended partition; although there is no specific technical reason for it, Sun chose to allocate its effort elsewhere and stopped at supporting primary partitions.) OpenSolaris uses its own naming scheme for physical disks under /dev/dsk/ explained in http://multiboot.solaris-x86.org/iv/3.html#vtoc . In VirtualBox I get /dev/dsk/c3d0p0 as the entire HD. p1--p4 are the primary (BIOS) partitions, p0 is the entire disk. Whether allocated a primary partition or the entire disk, OpenSolaris uses its own partitioning within it. These sub-partitions came from SPARCs, and are called "slices", are numbered according to function, not position in the table as in Linux. E.g., slice 0 is the file system/dataset intended to be mounted on "/", slice 1 contains swap, and slice 2 represents the bounds of the entire disk. More info on these conventions: http://www.joho.com/sun/ch03/101-103.html Ordinary Linux GRUB is not aware of these sub-partitions and sees only one partition of type "Linux swap" (due to an unfortunate numbering collision). ================= Boot environments & GRUB ================= The beadm utility generates the menu.list (located in /rpool/boot/grub/menu.lst) automatically. OpenSolaris' version of GRUB includes support for ZFS and, therefore, for its boot environments. Essentially, you can have several versions of the "/" filesystem metadata, consistent with their respective file contents. This has some downsides: Linux EXT3 fs support is not available, and, although GRUB will boot EXT2 and EXT3 partitions, the additional entries describing them must be added by hand after each re-write by beadm. More tips: http://sites.google.com/site/solarium/how-to-install-opensolaris ================= Modules and Drivers ================= See drivers.txt