[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: TASS and AAVSO



Stupendous Man wrote (in part):
> 
>   I think that we can do a better job with the Mark IV data than we
> did with the Mark III data, and I agree with most of what Chris Albertson
> has written on the subject.

"...I agree with _most_ of..."  OK what are the parts you don't
like? 

I keep posting these ideas mainly to see if anyone objects.  I
take a lack of comments as either silent agreement or at least
lack of any objections.

Not a lot of Mk IV data has been reduced yet so there
is no pressing need for a Mk IV database but I have a need to
learn the Perl DBD interface and Java JDBC.  Working on a TASS
Mk IV database could kill two birds with one stone.  If I am
lucky I'll learn something new and build a useful product at the
same time but I'd like to hear about any problems with my plan
before I start.

What I see as the main flaw in the current system is that every
time you want to add more data you essentially need to re-build
from scratch.  My goal this time around is that we should not
have to take the database off-line just to add more data to it.
It sounds easy but the last time I tried it, it was to slow
to be of use.
Matching data from multiple star lists it turns out is a hard
problem if you want the process to go fast when your lists are
100 million of so lines long.

My best idea so far is this:  When you want to merge a star list
of say, 1E4 stars into an existing catalog of (say) 1E9 stars you
know one thing, that all the stars in your list are contained in a
polygon (in RA, Dec space)
that matches the camera's field of view because the star list was
derived from one frame of data.  So, I can query a DBMS for "all
stars with in a polygon".  Yes, the square CCD maps to a polygon
when you transform it into RA, Dec.  We can then do an in memory
NxM match then write back all the catalog stars whose statistics
(i.e. Number_of_Observations) where changed by the match process.
Next we dump all the stars in the lilst into the "raw observations"
table. The result is we make only one pass through the star list
file on it way to the database.

The thing is, when you need to determine which catalog stars are
inside a polygon you do NOT want to examine all 1E9 stars. You'd
like to read ONLY the "correct" stars and never have to read data
off the disk that would fail the "contained within polygon" test. 
It seems to me that something like a quad tree index on the
catalog will be required.  It looks like we should be able to add
new data to an on-line database with the time required to add a
star list about constant with respect to the size of the catalog.

--
   Chris Albertson             home: chris@albertson-home.net
   Redondo Beach, California   work: calbertson@primeadvantage.com