[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Pipeline Production Problems



As you all should know by now, 9 computers are involved in the pipeline
processing.

There are 3 Windows machines collecting data, one per telescope.

There are 6 linux processors reducing the data.  These include two dual
processor machines and two single machines.

From time to time one of the windows machines hangs.  They are running
Windows 98.  A reboot usually fixes the offending computer but sometimes
the linux end also hangs..

The way I have set up the pipeline, when the night's data taking is
completed start 6 programs.  One each for early evening and late evening
data from each telescope.

Each program copies the images over the ethernet to a local work area, and
then starts processing.

From time to time one of the programs hangs.  This is almost always due to
the Windows machine hanging.  A reboot fixes the Windows machine, but then
sometimes the linux end is hung also.  Attempts to communicate with the
formerly offending device gets a "device busy" error.

I can fix this by rebooting the linux computer.  However, in the case of
the dual processor machines, one program might be running OK, I am forced
to wait for the working half to complete before starting the hung pipeline.

Is there a way to get communication with a Windows 98 computer restarted
without rebooting the linux machine?

I have looked at fstab, mtab and they are OK.  Where else should I look?
Closing the offending window in the linux machine does not help.  The
network stuff is working because I can communicate with the other computers
on the network.  It is just that I can no longer communicate with the
computer where Windows hung, even though it is now OK.    

Any suggestions would be appreciated.  My present plan is to set up a spare
computer which will have all the pipelines available.  I can then restart
the Windows machine and move the hung pipeline job to the spare computer.

Tom Droege