[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: External Queue



Nicholas Beser wrote:
> 
> Chris,
> 
> I had planned to use the shell script program that TASS uses to start up a
> conversion and copy task residing on mira. Do you think that is the best way
> to start a remote task off? I was also going to detach the task from the tass
> data collection. The potential script file is:
> 
> #!/bin/sh
> #
> # Put comands to reduce data here.
> echo Script called with argument $1
> rcp $1 mira:/data3/beser/tass/latest/$1
> rsh mira:/data3/beser/tass/latest/convert $1 &
> rm $1

The only problem here is that if mira, your data reduction computer,
is slow.  (I think you said it is a SPARC 2.Which is kind of like a
loaded 486 but with better I/O and graphics.)  

So, if 3 images come off the camera every 12 minutes that is one
image per four minutes.  The danger is that if the processing takes
longer then four minutes you will be creating processes faster then
they complete and will come in the next morning to find 100 data
conversion processes all running at a snails pace on your SPARC.
Assuming you have enough swap space to run 100 copies of the program.
On the other hand if the conversion takes only a few seconds then
you have no problem.  If all you are doing is conversion then it
will be OK.

I actually just finished working on a data collection system that
collects data from instrumentation attached to a military command
and control network.  The data got stuffed into a database then
made its way onto a bunch of web pages.  This was part of a bigger
training system.  We had some of the same concerns.  The traditional
way to address this is with a queue.  What we did is dump the data
as it is collected in raw form into a disk directory.  The raw data
collector did NOT start up the data processor.  Instead, the data
processor would periodically poll the raw data directory and pull
out the oldest file and work on it.  When it finished it would 
move (mv) the file to a directory called "processed".

One advantage of this setup, even if your data processing computer
is fast, is that you can shut it down and restart it or add a second
(or third) data processing computer.  The systems are "loosely 
coupled" and the system scales:  You can add more real-time data
collectors or more data reduction computers as required.  We don't
need this -- yet.  But I was thinking down the line when some
one will be running a four color Mark IV system and maybe a
Mark III triplet all at once.

Using the built-in printing facility to handle queuing is a hack
that would save development time but still would take effort to set
up.  What I'd do is use your "directly coupled" design first.  If
all you are doing is a simple image conversion even a SPARC 2
could keep up in real-time.  Once we start processing images all
the way to the Postgres DBMS I'd like to see queues, log files and
the ability to back out errors and so on but I think we can get
there in an evolutionary manner.
> 
> The convert script on mira will convert the fits into pgm, and then compress
> it via cjpeg, and finally copy the jpg to the web server, and generate a
> updated report file for the web server.
> 
> My concern is that I don't want the TASS data collection to slow down while
> the conversion process is going on. By detaching the task, I think the effect
> will be minimized. Do you think it may cause a zombie process? I am going to
> experiment an find out.


As I said I think it will work fine unless you overload the SPARC.
and I don't see that happening until we get a batch mode data reduction
pipeline.  One other thing, those script files that the Mk III controller
starts off already run as detached processes.  The real-time software
does not wait for them  The scripts run in parallel with the Mark III
software.

-- 
--Chris Albertson

  chris@topdog.logicon.com                Voice:  626-351-0089  X127
  Logicon RDA, Pasadena California          Fax:  626-351-0699