D'Agents 2.0: Known bugs

The most recent bug was identified on May 5, 2000.

There are currently twenty-four known bugs.

Solaris 2.6. If you are installing D'Agents on a Solaris 2.6 machine, read Bug Report 20 before proceeding.

Number Description Status
1 No agent_fork or agent_jump inside a Tk event handler. Not fixed
2 Lost upvar references to the env array Not fixed
3 Unmatched curly brackets inside a comment Not fixed
4 Incorrect error message when execing a program Not fixed
5 Wrong return code from a return command in a sourced script Not fixed
6 agent_jump from inside an uplevel might fail Not fixed
7 Linux only. Race condition in meeting establishment Not fixed
8 puts $fd "" does not work Not fixed
9 agent_transfer does not work Not fixed
10 configure script misses libraries with certain C++ compilers Not fixed
11 agent_disk does not flush the output stream Not fixed
12 Makefile does not install the shared libraries (FreeBsd and possibly some other platforms). Not fixed
13 agent_jump can fail if a procedure contains umatched curly brackets. Not fixed
14 The crypt library (-lcrypt) is incorrectly included during the linking process on some platforms. Not fixed
15 Incorrect compliation flag DPORTABLE (Solaris only). Not fixed
16 If you use meeting_send and meeting_receive to transfer a file from one agent to another, all newlines in the file will be turned into carriage returns on the receiving end. Not fixed
17 If you give an agent a symbolic name that contains whitespace (using the agent_name command), the agent's identification will end up as an incorrectly formatted Tcl list. Not fixed
18 agent_jump sometimes fails if you invoke it from inside a procedure. Not fixed
19 There are two mistakes and one omission in the installation instructions. Not fixed
20 Solaris 2.6 only. D'Agents works fine on Solaris 2.5 and some Solaris 2.6 machines. On some Solaris 2.6 machines, however, it crashes the machine. FIXED
21 Random failures of agent_meet command Not fixed
22 Compiler/linker reports the error "multiply defined symbol _sigchld_handler__FiPv" Not fixed
23 mask_replace incorrectly says that a mask is in use as the current meeting, event or message mask. Not fixed
24 The agent and agent-tk interpreters immediately segmentation fault on startup. Not fixed


Bug 1:
Pre-release
You can not use agent_fork or agent_jump inside a Tk event handler. Instead set up your event handling so that tkwait returns if the agent wants to jump.

Bug 2:
Pre-release
Lost upvar references to the env array

An upvar reference to an element of the env array is lost when an agent migrates and is not present in a forked child. Therefore you must recreate the upvar reference after a call to agent_jump or agent_fork. For example, the procedure

  proc run_around machines {

    upvar #0 env(DISPLAY) display

    set list ""

    foreach m $machines {
      agent_jump $m
      append list "The current display is $display.\n"
    }
  }
will fail with the error message
  can't read "display": no such variable
The following procedure, however, will work correctly.
  proc run_around machines {

    set list ""

    foreach m $machines {
      agent_jump $m
      upvar #0 env(DISPLAY) display
      append list "The current display is $display.\n"
    }
  }

Bug 3:
Pre-release
Tcl gets confused if there are unmatched curly brackets inside a comment. For example, the following Tcl fragment causes an error.
while {1} {
  # {
	puts HELLO
}
This is a long-standing problem with Tcl itself and will not be fixed as part of the D'Agents project. Do not use curly brackets inside a comment and you will be all set.

Bug 4:
March 13, 1998
When you open a pipe to an executable program (e.g., ping) with the following code,

open "| ping" r

you will get the error message

permission denied: standard input must be redirected if pipe is opened for writing only

This error message should be

permission denied: standard input must be redirected if pipe is opened for READING only

Incidentally, the correct way to open the pipe above is

open "| ping << {}" r

Bug 5:
March 15, 1998
If a child agent uses the source command to evaluate a Tcl script, and that Tcl script ends with a return command that returns an error code other than 0 (ok) or 1 (error), the source command will incorrectly throw a Tcl exception.

Bug 6:
March 16, 1998
If an agent_jump command is issued from inside an uplevel command, it will either work fine or cause a segmentation fault (depending on the machine architecture and compiler).

Bug 7:
March 17, 1998
For Linux only, there is a race condition in the code that establishes a meeting (i.e., the agent_meet, agent_accept, etc. commands). The race condition prevents correct meeting establishment, but it only occurs if the agent is not communicating heavily with other agents.

Bug 8:
March 23, 1998
Outputting an empty line to a file is typically done with the command

puts $fd ""

In D'Agents 2.0, this command will fail inside a child agent, since the empty string confuses the security procedures. Instead use the command

puts -nonewline $fd "\n"

Bug 9:
April 24, 1998
agent_transfer does not work.

Bug 10:
April 24, 1998
With certain C++ compilers (such as some versions of the SGI CC compiler), the configure script fails to detect available system libraries. For example, it might fail to detect the math library -lm and leave -lm out of the compilation flags. If you run into this problem, please contact Bob Gray for an updated configure script.

Bug 11:
April 24, 1998
One of the arguments to the agent_disk command is a file descriptor that has been opened for writing. agent_disk writes the agent's state image to this file descriptor. However, agent_disk does not flush the file's output buffers before returning. Therefore, you must close the file descriptor before you try to use the file (i.e., before you use the agent_transfer command).

Bug 12:
April 30, 1998
On FreeBsd (and possibly some other) platforms, the Makefile does not correctly install the shared libraries when you type
make install
We fixed this, but we inadvertently left the fix out of the source code release.

Quick workaround: Install the shared libraries yourself with the command

cp BUILD_PREFIX/lib/* LIB_INSTALL
where BUILD_PREFIX and LIB_INSTALL are the build and library-installation directories that you specified in the Makefile.

Bug 13:
June 25, 1998
If a Tcl procedure contains unmatched curly brackets (e.g., the normal curly brackets required by Tcl syntax plus one extra opening bracket inside a regular expression), agent_jump might fail on the destination machine. The root agent of the migrating agent will receive the error message
unable to restore state
If you run into this problem, please contact Bob Gray for the fix.

Bug 14:
June 25, 1998
The crypt library (-lcrypt) is incorrectly included during the linking process on some platforms. The linker produces several error messages about undefined symbols in the crypt library.

Workaround: Edit the Makefile (e.g., build/obj/agent-tcl/Makefile) that is causing the problem and remove all occurences of

-lcrypt
Then recompile.

Bug 15:
June 25, 1998
For Solaris only, one of the compilation flags is specified incorrectly. The compilation flag is currently DPORTABLE, but it should be -DPORTABLE with the normal leading dash.

Workaround: If you get any error message that includes the word DPORTABLE during compilation, edit the Makefile (e.g., build/obj/random/Makefile) that is causing the problem and replace every occurrence of

DPORTABLE
with
-DPORTABLE
Then recompile.

Bug 16:
June 25, 1998
If you use meeting_send and meeting_receive to transfer a file from one agent to another, all newlines in the file will be turned into carriage returns on the receiving end.

Workaround: Use the Tcl command fconfigure to turn off all automatic file translations, e.g.,

set fd [open file r]
fconfigure $fd -translation binary
You should do this in both the sending agent (before the meeting_send command) and in the receiving agent (before the meeting_receive command).

Bug 17:
July 13, 1998
If you give an agent a symbolic name that contains whitespace (using the agent_name command), the agent's identification will end up as an incorrectly formatted Tcl list. This improperly formatted list will cause problems with nearly all the other agent commands.

Workaround: Do not use whitespace in the symbolic names. Use an underscore (or some other character) as a separator.


Bug 18:
August 4, 1998
If you invoke agent_jump from inside a procedure (rather than from the script's top level), the agent will randomly fail with the error message
can't read agent(maxprecision): no such variable

Workaround: Edit agent-tcl/tclBasicCmd.cc, find the line that says

                $s eval uplevel #0 $code\n\
change the line to
                $s eval [list uplevel #0 $code]\n\
recompile, and reinstall. Or add the line
global agent
to the Tcl procedure in which you are calling agent_jump.

Bug 19:
August 24, 1998
There are two mistakes and one omission in the installation instructions.

  • Wherever you see SERVER_TCP_PORT, you should subsitute PORT.

  • The Makefile does not contain the variables TCL_INSTALL or TK_INSTALL. You can ignore the paragraph that refers to them. (These two variables were removed from the Makefile becuase they are redundant.)

  • The installation instructions state that you do not need to recompile the system if you are installing on a second machine that has the same architecture as an earlier machine (e.g., if you have compiled and installed on one Solaris 2.5.1 machine, and now want to install on a second Solaris 2.5.1 machine). Although this is correct, it is likely that when you install on the second machine, you will use an installation directory that has a different path than the installation directory on the first machine. In this case, before running the agent interpreters, you must set three environment variables:

    • AGENT_UNIX_SOCKET
    • TCL_LIBRARY
    • TK_LIBRARY

    AGENT_UNIX_SOCKET should be the full name of the Unix domain socket that is used for local agent communication (the full name of this socket is specified in the UnixSocket field in the agentd.conf file). TCL_LIBRARY should have the same value as the corresponding variable in the Makefile. TK_LIBRARY should have the same value as the corresponding variable in the Makefile.

    To make things easier, you might want to set these three variables inside a login script.

Bug 20:
August 24, 1998
Solaris 2.6 only. D'Agents works fine on Solaris 2.5 and some Solaris 2.6 machines. On some Solaris 2.6 machines, however, D'Agents crashes the machine. The machine crash is due to a bug in the kernel's socket code. Fortunately, this bug has been fixed in the most recent kernel patches.

Short version: To make D'Agents work on Solaris 2.6, you must install kernel patch 105181-09 or later.

Long version: "The agent system seems to trigger a kernel bug, the whole system crashes, with this kernel message in syslog:

unix: panic: recursive mutex_enter, lp=f5e38364 owner=f612e6e0 thread=f612e6e0 type=0 tsid=0

I then enabled savecore (man savecore) to save an image of the kernel across the reboot.

A stack trace showed me that the recursive mutex_enter happened somewhere in the socket code. I searched Sun's patch database and found a description of this bug in the README for patch:

105214-01: SunOS 5.6: /kernel/fs/sockfs patch

It seems like this was only a part of the fix, as I got one more crash afterwards. The patch was later integrated into the kernel patch (105181-08).

Right now I have 105181-09 installed, and the problem seems to be gone. So I'd propose you tell the users of your agent system to install the newest kernel patch (release 10 as of now), and try again."

Thanks: Many thanks to Martin Paul for tracking down the problem, and providing the "long version" text above. And many thanks to everyone for their debugging output and their patience.

Bug 21:
October 27, 1998
Depending on which C++ compiler you used to compile the system, the agent_meet command will fail randomly with an empty error message. This is due to a type-casting error that some C++ compilers apparently correct invisibly.

Workaround: Edit agent-tcl/tclBasicCmd.cc and make the following replacements:

Procedure: Agent_TransferCmd
Line to replace:
if ((agentId = Agent_Transfer (interp, seconds, &machine, channel)) == NULL) {
Replacement line:
if ((agentId = Agent_Transfer (interp, seconds, &machine, channel)) == (AgentId *) NULL) {

Procedure: Agent_MeetCmd
Line to replace:
if ((id = Agent_SplitAgentId (interp, argv[0])) == NULL) {
Replacement line:
if ((id = Agent_SplitAgentId (interp, argv[0])) == (AgentId *) NULL) {

Procedure: Agent_RequestCmd
Line to replace:
if ((id = Agent_SplitAgentId (interp, argv[0])) == NULL) {
Replacement line:
if ((id = Agent_SplitAgentId (interp, argv[0])) == (AgentId *) NULL) {

Then recompile and re-install.

Bug 22:
December 10, 1998
During compilation, some compilers/linkers will fail with the error:

agentdSupport.o: Definition of symbol `_sigchld_handler__FiPv' (multiply defined)
servProcess.o: Definition of symbol `_sigchld_handler__FiPv' (multiply defined)

Workaround: Edit server/servProcess.cc and replace every occurence of

sigchld_handler

with

sigchld_handler_background

Then recompile.

Bug 23:
March 4, 1999
If (for example) you try to replace the current message mask with itself, the mask_replace command incorrectly throws the error "mask is in use as the current message, meeting, or event mask", instead of just doing nothing.

Workaround: Add an if statement around the mask_replace command, e.g.,

if {$handle != $mask(message)} {
    mask_replace message $handle
}

Bug 24:
May 5, 2000

The agent and agent-tk interpreters immediately segementation fault on startup.

This problem appears to be caused by an incorrect initialization of the global environ variable. Try adding the line
char **tclDummyEnviron = environ;
to the top of agent-tcl/tclAppInit.cc and agent-tk/tkAppInit.cc, then re-compiling and re-installing. This line forces a reference to environ to appear in the main executable, which fixed the initialization problem.


Last updated on May 5, 2000

The D'Agents Project ended in approximately 2003. The software, and this web site, is no longer maintained. For questions please contact David Kotz.