This page presents summaries of the workshop sessions. The summaries are brief -- we cover the main points of the speaker without any "editorial" comments. If you feel that any of the summaries are misrepresentative, please send mail to David Kotz.
Gray started by defining a transportable agent.
A transportable agent is an executing program that can migrate from machine to machine in a heterogeneous network under its own control.Then Gray moved into advantages, disadvantages and potential applications of transportable agents. The main advantage is that, by allowing the movement of both code and data, a transportable agent system allows the programmer to minimize the use of a low-bandwidth, high-latency or unreliable network connection. Disadvantages are the difficulties associated with providing security (for the both the machines and the agents), fault tolerance, and effective development tools such as debuggers. Applications for transportable agents include workflow, information retrieval, network and platform management, and distributed simulation -- i.e., environments in which hard-to-predict operations must be performed against large volumes of data.
Gray concluded with a list of the services that are (or might be) needed in a full-strength agent system and of the contrasts among existing systems. Services include security, fault tolerance, resource discovery and other navigation tools, scheduling and planning, billing and electronic cash, etc. Contrasts among existing systems include commercial versus research development, imperative versus declarative languages, interpreted languages versus software fault isolation of compiled languages, stand-alone operation versus tight WWW integration, and the presence or absence of a jump primitive that transparently captures the agent's complete state.
The discussion period touched on several of these issues. First, many people felt that the definition of a transportable agent was incomplete, and needed to explicitly mention such things as security mechanisms, communication facilities, and appropriate languages. Other people disagreed and said that although these things are all important in a production system, they are not an essential characteristic of a transportable agent.
Second, many people felt that the advantage of transportable agents was not that you can minimize the use of a poor network connection, but that you can easily minimize the use of such a connection. They pointed out that for the applications under consideration, connection use could be minimized with a range of other techniques, except that these techniques are more involved and that different techniques would be required for different applications. In addition, some of the commercial attendees pointed out that the problem of securely performing billing and administration tasks was at least as important as any of the other problems.
Third, it was interesting to note the periods of silence during the discussion on potential applications. We all seem to be at the same stage where we strongly suspect that transportable agents are superior in many applications but are struggling to prove this fact to the world at large (and perhaps ourselves).
Finally, the question of standardization sparked a large audience response. It was generally agreed that standardization is needed to speed the acceptance of agent systems. However, there was disagreement on three points. First, can standardization happen now, or do we not yet know enough? Second, do we need new standards, or can agents fit into existing Internet standards? Finally, what exactly should be standardized, and what should be left as an implementation choice?
The presentation slides are available in compressed Postscript (528K).
Hammond started with a disheartening statement.
Planning is impossibleHe softened this statement a little, however, on the next slide.
Planning is very hard indeed.He argues that it is impossible to generate provably correct plans for arbitrary tasks or to generate any reasonable plan at all in a timely manner. Thus, there is no single solution to the planning problem, and we instead must rely on a range of specific techniques, selecting and tailoring one or more of these techniques until we have a planner that is appropriate to the task at hand.
Hammond then discussed a specific planning technique called case-based reasoning. In this technique the current state of the world is used to index into a set of precomputed plans. Once one of the available plans has been selected, this plan is tailored by examining the small differences between the exact state of the world and the ideal state under which the plan was computed. Once the plan has been tailored, it is used until the changing world state makes a different plan more appropriate. Case-based reasoning can be viewed as a combination of classical and pure reactive planning. Classical planning involves a state-space search that is directed by the differences between the current state and the desired end state. Pure reactive planning involves a set of rules that specify the next action solely as a function of the current state. Case-based reasoning reactively indexes into a set of classically-computed plans.
Hammond concluded with an automated shopping application. In this application a (simulated) robot must go to the grocery store and find all of the food items on its list within a certain time limit - the time list is such that the robot does not have time to do a brute-force search of every aisle in the store. The robot accomplishes this task using a case-based planner that is driven by its current position within the store, the remaining items on its list, and simple rules about which items are next to each other (e.g., syrup goes on pancakes, cheese and milk both require refrigeration, etc.). The robot knows the floor-plan of the store and can "read" the signs above each aisle. The robot performs well and illustrates that specific techniques tailored to a specific task can be successful.
The discussion period revolved around how planning could effectively be used in a transportable-agent environment. It was generally agreed that many agents need planning capabilities, particularly when locating and accessing some set of necessary resources. Hammond argued that this resource-location task is tractable as long as we are willing to narrow the problem space as much as possible and are willing to abandon the logical aspects of planning (i.e., construct a plan that will probably work rather than one that will provably work).
A related question was whether classical or reactive planning was better suited to transportable agents. In other words, does an agent start with a complete plan of action or does it reactively adapt to existing conditions as it moves through the network and performs its task? It seems that neither one is generally suited to all agents, and that we must select one or the other (or some combination) as the task dictates. It was noted, however, that it will often be difficult for an agent to have sufficient knowledge to construct a complete plan.
A final question was whether agent mobility could be leveraged during the planning process to construct "better" plans. This was left as an open question, but it was clear that mobility would be an advantage during the planning process only if resource directories, etc., were widely distributed across the network and the planning process required multiple queries to each directory.
Schneider argued that a fault-tolerant agent system must satisfy two criteria.
safety | an agent must not report incorrect results |
liveness | an agent must eventually report the correct result |
The difficulty of satisfying these criteria depends on the kinds of failures that can occur within the agent's environment. Two extreme failure models are crash and Byzantine.
crash | A host might suddenly stop working but will not actively interfere with an agent. |
Byzantine | A host can actively interfere with an agent in any way (including collusion with other hosts). In Schneider's words, an assumption of a Byzantine model is a non-assumption since no behavior is excluded. |
A more realistic failure model (in many environments) is to assume that the visited hosts might crash and that other hosts might be Byzantine.
Schneider then considered two schemes for tolerating failures. The first scheme is concerned with crash failures only. Imagine that the agent's task is divided into stages that must be performed in sequence. Each stage requires a different resource or service and is performed on a different machine. Since we want to tolerate crash failures, we cannot simply send out a single agent. Instead we must send out multiple agents and each agent must perform the stages of the task on different machines than its siblings (which implies the need for replicated services). To tolerate t crash failures per stage, we must send out t + 1 agents. Rather than having each agent do the all of the work, Schneider proposes a primary/backup scheme in which one agent is the primary agent on each stage. This primary agent will do the work. The other agents are backup agents that will wait at their respective machines. If the primary agent disappears due to a crash failure, one of the backup agents will take over. Subtle issues associated with this approach are the need for failure detectors (to detect the loss of the primary agent) and the need for garbage collectors (to reap the unused backup agents).
The second scheme (which has several variations) is concerned with Byzantine failures. Here malicious hosts can use any available means to influence the agent's computed result. Again the agent's task is divided into stages and each stage requires a different service or resource on a different machine. Schneider's basic approach is to again send out multiple copies of the agent but to now have these agents compare their results at each stage and then use the majority result for the rest of the computation (more specifically an agent sends its results to every machine in the next stage - a single agent continues from that machine with the majority result as its input). Clearly, if there can be t Byzantine hosts at each stage, we must send out at least 2t + 1 agents at each stage. The situation is slightly more complex than this, however, since a Byzantine host could send out multiple results and create a false majority at the next stage (or t any stage down the line). To prevent this spoofing attack, cryptographic authentication techniques are required. Schneider uses either secret-key sharing or authentication chains ( it must be noted that the simplest key sharing techniques do not work since malicious hosts from different stages could steal different subsets of the secret key fragments and then collude to recreate the complete secret - instead the secret key fragments must be redivided at each vote or the secret key must be renewed after each vote).
Several issues were raised during the discussion period. The main issue was whether it was wise to assume that replicated services would be available for each stage. Schneider agreed that exact copies would be unlikely in many applications and in fact undesirable if these exact copies were under a single administrative control (since then if one of the copies was actively or purposefully malicious, they all would be). He pointed out, however, that there are many organizations that provide almost the same service. Thus the agents at each stage might use different services to obtain approximately the same results. In other words, if the agent can handle approximate service replicas and approximate voting schemes, then it could use the schemes above in today's Internet.
Another issue that Schneider mentioned and that sparked quite a bit of audience interest was the notion of a self-checking computation. Here only a single agent is sent out but each host attaches a cryptographically-signed proof of its portion of the computation to the migrating agent. The agent's destination checks the proof to make sure that the agent performed its task correctly without any interference from a malicious host. This scheme is useful only if the proof can be checked much faster than simply redoing the entire computation.
A final issue was whether Schneider's fault-tolerance schemes would be useful when the agent modifies the state of each machine through which it passes - i.e., when the agent has an externally visible effect. There was no general consensus on this issue although it was agreed that it largely rests on the nature of the side effect, and that for some kinds of side effects, a full-blown transaction system would be required.
Morris started with an example of physical security. A box labeled "Genuine U.S. Encryption Machine" shows up at an U.S. Army loading dock. Before the Army personnel take the machine inside and start using it, they must ask (and answer) a series of questions.
Morris said that these are the exact same questions that must be asked in any security-sensitive environment. In the case of transportable agents, we have replaced a physical packing crate with electronically-transmitted code, but the machine that receives the agent must ask and answer these same questions.
Security in the electronic world revolves around cryptography. Morris suggested that cryptography serves two main purposes.
|
The contents of a message must be protected from disclosure during transit. However, since a message is only useful if it is delivered, total secrecy is impossible. |
|
It must be possible to determine who sent the message and whether the message was modified during transit. |
Morris then suggested that it was wrong to view cryptographic systems as either breakable or unbreakable. He stated that any code can be broken, and that during his years at the National Security Agent (NSA), they often broke "unbreakable" codes. Instead he suggested that we should view cryptographic systems as a matter of expense versus reward. If the expense of breaking a code is greater than the potential value of the encrypted information, the cryptographic system is sufficient for the task at hand, even if it falls far short of providing "perfect" security (which is impossible anyways). In the case of transportable agents, we must ask ourselves what damage can be done to the machines, people and organizations that are using the agents under a particular security regime, and then whether this potential damage justifies the expense of a more complex regime.
Morris illustrated this idea with the example of an ATM machine. If his bank is the Ledyard Bank in Hanover, New Hampshire, and he withdraws 200 dollars from an ATM machine in England, there are three parties involved in the transaction, each of whom has different needs and expectations.
Morris | He expects 200 dollars to come out of the machine and expects that exactly 200 dollars (no more and no less) will be debited from his account at the Ledyard bank. |
English bank | The English bank that owns the ATM machine expects Ledyard bank to send 200 dollars to replace the money that their machine has just given to Morris. |
Ledyard bank | Ledyard bank expects that Morris will not be able to make a false, convincing case that he did not really withdraw 200 dollars and should get his money back. |
The important question that must be asked is how much damage can be done to each of these parties in the worst case? Assuming that the banks are trustworthy and correctly enforce the 200 dollar per day withdrawal limit, the most money that anyone can lose is 6000 dollars per month. The security mechanisms for the ATM network must be designed with this potential damage in mind. For example, the damage is probably not great enough to justify the expense of public-key cryptography, but from Morris's point of view, it is certainly great enough to justify more security than what is currently in place.
The discussion period was somewhat more general than during the other sessions so there is less to summarize. There was quite a bit of discussion about Morris's assertion that security systems should be viewed solely from an economic standpoint - does the eliminated risk justify the expense of the cryptosystem? Although it was generally agreed that this economic viewpoint made sense, the exact implications of this viewpoint on transportable agent systems was unclear. However, it was obvious that different agent applications would require dramatically different security levels, which led to the suggestion from several people that agents might pay more money for higher security levels.
There were several attempts to get Morris to suggest a particular cryptosystem for use in agent systems and to comment on the strengths and weaknesses of existing systems. Morris essentially refused to do this, although he did make the interesting statement that although the NSA does not use public-key cryptography to protect their own secrets, they have no objection to us doing so. Morris also reiterated his position that cryptosystems must be viewed as a matter of expense versus risk, and that any cryptosystem might be sufficient for a particular application, depending on the value of the information that it is being asked to protect.
Finally there was some discussion about the notion of software bonding where a company would provide guarantees about the behavior of their software, perhaps in combination with a cryptographic signature of that software. Although it was agreed that this was a good idea, several commercial attendees said that, from a liability standpoint, it would be difficult to get their own companies to do so.
White examined various aspects of navigation within the context of General Magic's Tabriz (or Telescript) system. Telescript has both agents and places. An agent is a mobile, named program. A place can be be thought of as a virtual machine. There can be one or more places per physical machine; each place is immobile, has both a name and an address, and provides some service to the agents that visit it. To function effectively within this network of places, the agent system must provide four capabilities.
Travel. The basic travel mechanism is the go instruction which an agent uses to migrate to a specific place. The go instruction transparently captures the complete state of the agent and sends to state to the Telescript server (or engine) that controls the destination place. The server authenticates the owner of the incoming agent, and then based on this authentication, the server and then the place itself decide whether or not to admit or reject the agent. If the agent is admitted, its state is restored and it continues execution immediately after the go instruction as if it had never stopped. Other important travel mechanisms are the send, route and clone instructions. The send instructions sends clones of the agent to multiple destination places; route sends the agent through a sequence of places (with agent doing any desired processing at each place); and sense returns status information for a place.
Interaction. Agents communicate in two ways. If two agents are on the same machine, they use the meet instruction, which gives each of them references to the other's objects (each agent has the opportunit to reject the meeting before the references are given out). If two agents are on different machines, they use the connect instruction to establish a connection with each other, over which they can send copies of any Telescript objects (each agent again has the opportunity to reject the connection). When requesting a meeting or a connection, an agent presents a ticket that specifies the terms of the communication, such as quaility, duration, and so on. The other agent and the Telescript system will accept or reject the communication attempt based on this ticket.
Institutions. Agents will be at best unwieldy and at worst useless unless there are places and agents that provide services for the agent's use. A service agent might be a specialist that provides a specific service such as information retrieval against a particular database, a broker that servers as a go-between for two or more agents, a recommender that recommends some other service agent based on the agent's current needs, or a guide that essentially leads an agent through the network. A service place might be a directory that provides lists of available service agents, an outpost which is an alternative to the home machine when an agent needs to go somewhere and wait for an event of interest to occur, or a trustworthy meeting hall to which two mutually suspicious agents can both travel when they need to conduct business.
Obstacles. White identified several problems that make it extremely difficult to use agents in large-scale Internet applications.
Talked about Knowbots, an agent system that focuses on system issues. Both the agents and the underlying system are written in Python.
Features:
Presented CyberAgents, a commercial agent system from FTP software. The system is written in Java and is supported on a wide range of platforms.
Features:
Demonstrated a Netscape plugin that allows Tcl/Tk scripts to be embedded inside a web page and executed on the browser end when the page is downloaded. This is analogous to Java applets except for the use of Tcl/Tk rather than Java.
Features:
The presentation slides are available in Adobe Acrobat format (28K).
Talked about WAVE, a system for the creation of distributed knowledge networks and the solution of problems using these networks.
Features:
Talked about TACOMA, a mobile agent system whose most notable use is in a distributed system called StormCast that monitors changing weather conditions. What distinguishes TACOMA from other systems is the emphasis on operating system support for mobile agents and on support for multiple agent languages.
Features:
He talked about the need for a revolution in programming languages to make it easier to develop agent systems. Language is the right starting point since it is better to protect at the specification level than at the implementation level. Also safe languages have had some good success whereas secure kernels have been mostly failures.
Areas where a "language" can help are:
Directions for language design:
His main concern was with the lack of structure within HTML. The same information is often presented differently in HTML which makes agent programming hard. He noted that HTML pages are often created on the fly by web servers that have direct access to an underlying database. He suggested having, in addition to the web server, an agent server that provides programmatic access to the underlying database.
He stressed on the need for agents to communicate and suggested a framework for an Agent Communication Language (ACL). Its components are:
KIF | Knowledge Interchange Format | syntax |
Ontolingua | language for defining sharable ontologies | semantics |
KQML | Knowledge Query and Manipulation Language | pragmatics |
Protolingua | language for defining a communication protocol as Augmented Transition Networks (ATN) of communicative acts | pragmatics |
The goal of the Agent Communication Language (ACL) is to allow agents to communicate effectively regardless of their developer or point of origin. Tim also suggested that the knowledge acquired from other agents could be used to modify an agent's internal rules and that mobile knowledge can be treated as mobile declarative code.
The presentation slides are available in HTML
He stressed that the problem with current architectures is that:
His ending quote was:
The success of software agents will be directly related to the extent in which content becomes machine parsable.
The presentation slides are available in Adobe Acrobat format (48K).
Talked about on-going work at Dartmouth that uses agents to collect documents on the net and then build clusters of documents. Each cluster contains documents that are in some way related to each other. Different techniques are used to display the clusters and allow the user to browser the collection.