![]() |
Abstract for M. Ranganathan
|
=========================================================================
ETCL: A fault-tolerant agent infrastructure for customizable conferencing =========================================================================
M.Ranganathan (NIST and UMCP dept of CS) and Joel Saltz (UMCP dept of CS and John Hopkins dept of Pathology) ------------------------------------------------------------------
The driving application domain for the Agent system we describe in this abstract is is multi-media conferencing - for example, Teleconferencing, Telepathology and joint exploration of large scientific data-sets. A conferencing system is characterized by frequent and unpredictable changes in message traffic patterns and participant presence. Usually, such systems have the notion of a shared workspace involving multi-media tools. For example, in a Telepathology session, participants may jointly manipulate a shared viewing tool which mimics a high resolution microscope and allows users to explore a digital representation of a high resolution image.
Conferencing systems usually incorporate a "floor control policy" that decides who gets control over input to the the shared work-space when there are different users contending for control. Typically, the multi-media tools that comprise a conferencing system are independent from the floor control system and take enabling and disabling commands from it. Various policies for floor control are possible - for example, first come first serve, priority based, frequency based, role based and so on. The number of policies one may envision is large. From a software engineering viewpoint, it would be undesirable to hard-code any particular policy. Moreover, there may be multiple floors active at any given time with a need for some level of mutual awareness of ongoing activities in concurrent sessions (for example "birds of a feather" sessions). Hence our first requirement is to develop a system that allows installation of arbitrary, policies in a flexible manner. Second, to improve the "bandwidth" of human interaction, it is important to have low latency in moving the floor from user to user - especially in highly interactive situations such as voice activated floor control. Third, scalability is important. The responsiveness of system should not degrade unacceptably as more users join the conference. Fourth, fault tolerance and reliability are important - for example, the conference control system should not deadlock when the floor owner shuts off her machine. We would like to achieve these goals with a minimum assumed reliable infra-structure. Flexibility and heterogeneity requirements argue for an agent based approach to building such systems. Low latency, scalability and minimizing assumed infrastructure argue for using client (participant) resources for constructing such systems. Since these resources are variable and participants can leave without warning, we need mobility and fault tolerance to react to situations when participants join and leave.
We are developing an agent infra-structure called ETCL that is "event oriented" and incorporates these features. We have included only the minimum set of features we feel are necessary for the job. Hence, we avoided incorporating complexities such as a general "go" capability for migration of state at arbitrary points in the program. We now give an overview of our agent infrastructure and programming model.
The programming model consists of several mobile streams. Each stream is uniquely identified by a name. Messages may be posted to a stream by identifying it by name. Messages are guaranteed to arrive reliably and are processed in the same order that they were posted. A stream can be housed on any machine that is running an ETCL "engine". An ETCL engine can house several streams and the stream-engine mapping can be changed as time progresses however at any given time, the stream may be mapped to only one engine. We assume the presence of at least one reliable machine that remains up during the entire length of the conference. Each stream is assigned a reliable "home" machine from among the set of reliable machines. The reliable machine(s) serve as a directory service and as a repository of state information for fault recovery. Each stream in the can be associated with zero or more handlers. Each handler has its own TCL interpreter. Each stream is assigned a separate thread to run the handlers. A handler has four parts - an initialization part which is executed once when the handler is installed; an "on_append" iterative part which executes each time a message is posted to the name; an "on_move" part which is executed at the destination when the stream-engine mapping changes and an "on-failure" part which executes when the stream recovers from failure at its home location. Migration (ie. changing the stream-engine association) can happen only after a handler completes execution. The system is obliged to complete the movement before the next message is appended to the stream. Migration involves moving all variables that each handler associated with the stream declares to be in its "briefcase" as part of its script. A handler may write its briefcase out to its reliable home machine at any time with a "synch" command. If the machine housing the stream fails, the "on_failure" script is invoked at the reliable machine that has been assigned to the queue.
Our system borrows features from several other systems such as TACOMA and IBM Aglets while leaving out what we consider to be "unnecessary complexity" for our application domain. We like to think of our system as a combination of "active message" and "resource-aware agents" with fault tolerance.
We are implementing ETCL as a TCL extension without any modifications to TCL and hope to make our infra-structure freely redistribute-able.
=======================================================================