... We interrupt our programming to talk about ...

========== OpenSolaris pragmatics ==========

So far we have been looking at OpenSolaris from the kernel
programmer's point of view. Today we are going to switch to the system
administrator's angle.

The practice of system administration has been known to generate
problems that have deep impact on programming languages (think Perl
and the subsequent explosion of scripting languages such as Python and
Ruby; these days a lot of production programming is done in those) and
operating systems.

In general, when a system administration task turns out to be both
frequently needed and laborious, this suggests the need for a new OS
feature or mechanism change to take care of it -- either automate it
or obviate it. Some of these changes need to go surprisingly
deep. OpenSolaris "boot environments" and "snapshots" provided by ZFS
are a good example -- they required a revolutionary change in the
primary (root) filesystem.

When you break your installation, it is likely that your will encounter
reactions from the following OpenSolaris subsystems:

1. Services management 
2. ZFS boot environments
3. GRUB boot configuration

For each of these feature, in order to understand their behavior in OpenSolaris,
we look at how respective tasks are handled in Linux.

======= 1. Services (technically, daemon processes or groups thereof) ====

Debian Linux's init process reads /etc/inittab on startup and gets the
"default runlevel" and "respawn" clauses ("if this process dies,
restart it"). Thus init is responsible for restarting daemon/server processes
that die accidentally.  For example, to respawn the Linux virtual consoles:

# Format:
#  <id>:<runlevels>:<action>:<process>
#
1:2345:respawn:/sbin/getty 38400 tty1
2:23:respawn:/sbin/getty 38400 tty2
3:23:respawn:/sbin/getty 38400 tty3
...

All scripts to start and stop server processes are in /etc/init.d/ ,
and they expect "start", "stop", and often also "restart", "reload"
(to load new configs into a running process).

The Linux "runlevel" is a collection of processes to be started in a fixed order
(so that their mutual dependencies are satisfied). This order is observed as
follows: the script   /etc/init.d/rc   uses the directory listing order to start
the scripts in turn. It boils down to
 
    for s in /etc/rc$runlevel.d/S*
        do
           $s start
        done

----
Suggestion: read the startup script on your Linux system and see the
shell scripting tricks involved. Start with  "ls -l /etc/rc2.d/" .
What is the role of the  K*  symlinks? 

How is concurrency handled? That is, if the system has multiple CPUs,
how are they used to speed up the startup process?

A Debian script /etc/init.d/rc  script is available in the course dir as debian_rc_script.
----

Thus in Linux (more precisely, "System V" UNIX style) the runlevels are represented 
as directories of symlinks, and the symlink naming controls the order of execution
and provides for correct ordering of dependencies. Restarting crashed daemons
is done by init, based on executable name.

This system was powerful and flexible for its time, but it it makes
expressing mutual dependencies an exercise in arranging symlinks
and does not express much else.

OpenSolaris gets rid of all that, and describes services and their dependencies
in XML files, located in  /var/svc/manifest/*  . Each dependency is an XML
element, and refers to another XML file for the actions to performs to
activate the dependency. Shell scripts are retained, but now they are called
"methods", and invoked by  the separate daemon,    svc.startd  . 
The scripts themselves now reside in /lib/svc/method/ . 

So the task of restarting crashed daemons *and* all their dependencies if
need be is broken out of the init process into a separate daemon.  
This daemon,  /lib/svc/bin/svc.startd  , is one of the remaining few
processes  started and respawned by  init itself (see /etc/inittab  and
"man svc.startd" page).

Suggestion: read the XML manifests for  ssh  and other familiar
daemons. Find their shell scripts, and read them too.

The dependency system for the services is described in detail
in http://www.sun.com/bigadmin/content/selfheal/smf-quickstart.jsp .

"Predictive self-healing" here means that the system knows (from XML manifests and
scripts) how to restart a daemon that failed, reloading/reinitializing as necessary
any other OS components that it relies on.

Linux "runlevels" in this scheme correspond to "milestones" -- both are groups
of processes to be started together.

For a brief summary, see 
http://wiki.genunix.org/wiki/index.php/OpenSolaris_Cheatsheet

NOTE: 
Should you find yourself in "maintenance mode", disabling problem services
will get you out of it.

===============  ZFS and "Boot Environments" ====================

Traditional UNIX filesystems on x86 existed in a hard disk partition, primary
or extended/logical. The format of the partition table is very simple
and goes back to the times of DOS; an entry contains the starting sector 
and the number of consecutive sectors that form the partition.

 Initially, the disk's partition table had only 4 entries (and thus
allowed up to 4 partitions per disk). Eventually it was extended, by
allowing one or more of the partition entries to be marked as
"extended", which meant that an additional partition table was
contained at the partition's start (partitioned it contained were
called "logical").

More info about PC partition tables:

http://www.win.tue.nl/~aeb/partitions/partition_tables.html
(see also http://en.wikipedia.org/wiki/Disk_partitioning)

Then traditional systems like EXT2 or UFS placed their superblock
and inode metadata at the beginning of their assigned partition.
While simple, changing the disk's layout, in particular adding 
storage space to existing filesystem, and then growing
or shrinking individual filesystems to use it, was hard (cf. Gparted LiveCD,
http://gparted.sourceforge.net/livecd.php) 

The Logical Volume Manager tried to alleviate the problem of
reconfiguring storage by providing the illusion of a physical disk to
the OS (via a driver), while in fact using non-contiguous block areas
from one or more actual hard drives. The driver was loaded early on
boot, so that the rest of the kernel could then be loaded from the
pseudo-device. For the summary see 
http://www.tldp.org/HOWTO/LVM-HOWTO/ and 
http://en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux)

Although LVM solved the problem of adding physical storage, the
problem of resizing partitions to take advantage of added blocks
remained. In particular, filesystems had to be unmounted in order
to be resized, i.e. the system had to be shut down for the duration.

OpenSolaris did away with all that in ZFS. The ZFS filesystem has no
"rigid" layout that matches a stretch of physically or virtually
consecutive blocks.  Instead ZFS allows blocks to be added to its
"pools", and allows nesting of pools. 

BTW: 
This should remind you of the nesting of VMEM allocators, made
possible by the OpenSolaris object-oriented style. 

Filesystem metadata is handled as "datasets", and ZFS allows
co-existence of several datasets for the same pool. The latter
is revolutionary, because it allows to keep the state of the filesystem
"before" and "after" operations that can trash the system,
such as "pkg image-update", and revert to the "before" state 
if need be.

More information on snapshots and clones (read it!)

http://developers.sun.com/developer/technicalArticles/opensolaris/boot-environments.html
and 
http://dlc.sun.com/osol/docs/content/2008.11/snapupgrade/gentextid-173.html

Cheat-sheet that gives some idea of ZFS capabilities:
http://wiki.genunix.org/wiki/index.php/OpenSolaris_Cheatsheet#ZFS

Note the nesting of pools -- this is a bit confusing to those who are
used to the strictures of BIOS partitions, but is natural for
object-oriented thinking: one object gets a subset of the containing
object's resources, and manages them autonomously.

ZFS can use either raw disks or primary partitions (you *cannot* yet
install OpenSolaris in a logical/extended partition; although there is
no specific technical reason for it, Sun chose to allocate its effort elsewhere
and stopped at supporting primary partitions.)

OpenSolaris uses its own naming scheme for physical disks under /dev/dsk/ 
explained in http://multiboot.solaris-x86.org/iv/3.html#vtoc .

In VirtualBox I get  /dev/dsk/c3d0p0  as the entire HD.  p1--p4 are the primary
(BIOS) partitions, p0 is the entire disk.

Whether allocated a primary partition or the entire disk, OpenSolaris
uses its own partitioning within it. These sub-partitions came from
SPARCs, and are called "slices", are numbered according to function,
not position in the table as in Linux.  E.g., slice 0 is the file
system/dataset intended to be mounted on "/", slice 1 contains swap,
and slice 2 represents the bounds of the entire disk.  More info on
these conventions: 
        http://www.joho.com/sun/ch03/101-103.html

Ordinary Linux GRUB is not aware of these sub-partitions and sees
only one partition of type "Linux swap" (due to an unfortunate
numbering collision).

================= Boot environments & GRUB =================

The beadm utility generates the menu.list  (located in /rpool/boot/grub/menu.lst)
automatically. OpenSolaris' version of GRUB includes support for ZFS and, therefore,
for its boot environments. Essentially, you can have several versions of the "/"
filesystem metadata, consistent with their respective file contents. 

This has some downsides: Linux EXT3 fs support is not available, and, although
GRUB will boot EXT2 and EXT3 partitions, the additional entries describing them
must be added by hand after each re-write by beadm.

More tips: http://sites.google.com/site/solarium/how-to-install-opensolaris

================= Modules and Drivers =================

See drivers.txt