[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Work to Do (was: Lots of Data)
I do agree that in the end the data will be processed into star lists
at the camera location.
I did do a calculation and convinced myself
that I could transmit image data as it came off the camera from a remote
site to my home in real time. Each exposure yields 16,000,000 bytes
It you do 5 minute (= 300 second) exposures that's
16,000,000/300 = 53.33 Kbytes/second. My internet connection can do
about 70 Kbytes/second. I would not even need much disk space at the
camera. Even though I could, there are reasons why I wouldn't want
to operate that way. I have always argued for moving the computer to
where to data is rather then moving the data. So I'd put a computer
at the remote site.
Processing data into star lists is a fairly well understood problem.
There are details to be worked out and scripts to write but the problem
has been solved so many times that one of the biggest jobs is evaluating
the existing software.
The big question is "what to do with all the star lists?" Yes, I did
say that the current database design would not scale up to the Mark IV
systems numbers. I did not mean that we could not build a database.
What I meant was that the current design would not work. I think we
all do want a common "TASS Database". In the best world, I think we
only want the common database to hold only the "good stuff" with all
the noise hits, "seen once" stars and other crud removed. Michael's
"tenxcat" is this. The assumption being that if something was seen
ten times it must be real. "Ten" is an arbitrary number so I'll call
it "Nxcat".
Our current database approach is exactly contrary to my suggestion to
"Put the computer where the data is" Having everyone sending their data
to Michael is conceptually simple but puts an unreasonable burden on his
one PC. I think we can de-centralize the database. The idea is that
everyone (every camera site that is) keeps a database of that site's
data. they filter the data and send only the "best" 1/M of that data
to the central site. M is determined by how much data the operator of
the central Nxcat database is will to accept.
I think the above is about the only "big picture" we can draw. Am I
right? My opinion is that this is what we will do because that is about
all we can do so there is little to argue about. Now the details.
We can argue over these. Here are some proposals I'll through out:
1) The database associated with each camera site need not be physically
at the camera site nor need it be operated by the camera operator.
Someone could volunteer to work with a camera operator and put all of
his star lists into a big site database. I think some camera operators
will want to do the database work them selves but I am guessing some
would welcome help.
2) I think there can be more then one common database. If the "Nxcat"
is made up of the 1/M "best" data then if there is more then one way to
define "best" then there can be more then one "Nxcat". What's "best
depends on your goals/interests Lets say I want to look for planets
transiting a stellar disk and someone else wants to look at very red
stars to hunt for variables. We'd each have a different Nxcat. I can
see there being four or six of these specialized "central" databases.
This means that each operator of a site specific database will periodically
send off four to six data sets, one to each of the operators of the
specialixed central databases.
3) even with the above, we can still have a true "central" database
in effect even if not physically central database. I can imagine a
web page where I could query for _all_ observations from all TASS
camera of a certain square box in the sky. The query could be entered
into a web form. The form would then go to each database and pull
information and then combine it and present the TASS wide result to
the user. The site that hosts this web page need not be one of the
database sites.
"Gutzwiller, Michael" wrote:
>
> We've been down this road before with the Mark III and Tom is right, there
> is a strong tendency for us programmer types to want to do it all ourselves.
> That's the way it started with the Mark III with at least three analysis
> programs (Star, IRAF and Sextractor) and two databases developed (one for
> Oracle and one for PostgreSQL). In the end only one combination resulted in
> published data, the Star - PostgreSQL chain that resulted in the tenxcat.
>
> So will the same thing happen with the Mark IV? Is this the right way to do
> it? Offhand I don't know. One thing is obvious to me, only those with
> cameras will process any significant amount of raw data. This is due simply
> to the extreme amount of data involved. With each camera producing 3 to 4
> Gb of data per night there isn't much of an alternative. Shipping out a
> night or two is certainly possible but keeping up with the demand would be
> quite difficult. Will we all end up using the same tool? I doubt it.
> Arne, Michael and myself have different reasons for choosing the tools we
> want to use.
>
> Will all the processed data end up in a single database? This is still
> doubtful. The reason is the same, the tremendous amount of data. Data
> reduction may "compress" the data by about a factor of 10 but that still
> gives us about 300 to 400 Mb per night per camera. Even with high speed
> internet connections it would take hours to transfer that amount of data.
> Chris has already said the current database design couldn't import the
> amount of data in a reasonable amount of time.
>
> So where does that leave us? I for one certainly don't want to lose the
> group effort part of TASS. I certainly don't want to Balkanize ourselves
> into single camera efforts. This suggests an the area to direct new effort;
> that is, solve the problem of distributed data. If the raw or even
> processed data never leaves a single place, how do we combine and analyze
> the data from several sites simultaneously? This is the area I think will
> need the most work. It's also an area where there isn't a readily available
> solution.
>
> Thanks,
>
> Mike G.
>
--
Chris Albertson home: chris@albertson-home.net
Redondo Beach, California work: calbertson@primeadvantage.com