[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[TASS] TASS Database. Was:photometric vs. non-photometric
You have it pretty close. I'll take credit/blame for many
of the detects and have been wanting to make another pass
at the design for some time now. It will not be easy.
Speed is an issue. With the Mk IV systems the crent system
just will not cut it.
1) Conceptually there is one table called "observations" that is
just the super set of all star lists ever output by STAR.EXE.
Plus one more column calld "TASS_ID"
2) A second table called "tass_catalog"
It contains esentually "TASS_ID", Number_of_observations, mean_ra,
mean_dec and mean_magnitude.
Yes, the "tass_catalog" is seeded with tassm16 data. and yes you
can do a join on TASS_ID.
Alan McCallum wrote:
>
> Hello all.
>
> I am trying to understand the TASS database, so I may rehash the thoughts of
> others here. Sorry.
>
> Am I right in assuming:
> Initially there are two tables similar to the outline below. The index table
> is seeded with catalog (Tassm16) positions. When an initial match of
> observation to catalog is found, a new TASS ID is assigned to both tables,
> thus allowing a join between the two tables (one to many) on the TASS id#.
> The database now has one linked star in both tables:
>
> INDEX TABLE
> Catalog ID ??
no catalog ID, there are more fields but that does not matter now.
> TASS ID (one)
> Catalog RA
> Catalog Dec
> ...
>
> OBSERVATIONS TABLE
> TASS ID (many)
> Obs RA
> Obs Dec
> ...
>
> Then when the second object within the merge radius of our first star is
> found, a mean (including the original catalog position?) was taken, and the
> following was the situation::
> INDEX TABLE
> Catalog ID ??
> TASS ID (one)
> Mean RA
> Mean Dec
> ...
>
> OBSERVATIONS TABLE
> TASS ID (many)
> Obs RA
> Obs Dec
> ...
>
> Firstly. What happens to objects that are not matched to a seed value?
> Dumped forever? I don't think this point has been spelled out.
We add observations one at a time. After adding one we search the Tass_catalog
(your "index") if no match was found we create a new entry in the table.
We now have a "perfect" match. So there _is_ no "unmatched" case to handle.
One important fact we learned is that ++almost half++ of all observations
are NOT matched to the seeded catalog. This was surprizing (to me).
I think we are detecting a lot of noise hits. A lot of our data is
this kind of stuff.
Yes it was a dumb error to compute the mean so soon. What was intended
was to continue using the tassm16 location untill the number of observations
became "large".
>
> Secondly. A variation on the above method (I think this is what Arne & Jure
> have both implied recently), is to leave the catalog positions in the index
> table untouched (as in the first example), and simply populate the
> observation table under a particular TASS id# if it falls within the
> inclusion radius. And then only use the index table positions to refer to
> the star. This only gives an _apparently_ steady position, ie the catalog
> position. The observations dance around anyway. Would there be less spurious
> pairs? I am not sure. But I think there would be a small gain in accuracy.
> (There would have been a very definite gain in database usefulness &
> efficiency if the seed Catalog ID was retained in the index table along with
> the positions. If the Catalog ID _has_ been retained in an index table, it
> is still possible to do a one to one join to another instance of a table
> loaded with Tassm16 to recall the relationship between the seed positions
> and the current positions.)
>
> In general, it is not advisable to store calculated results in database
> tables.
That's debatable. In general yes you are right but pre-computing some
values makes some operations faster. The current design is the second
generation. and speed was the number one concern. It was taking hundreds
of hours to make an update run as I was using a transactional model with
each observation being one transaction. Trouble was we had batches of
100,000 transactions to handle and are stuck with Pentiums and IDE disk
drives. This hardware is hard pressed to handle 100 transactions per
minute
The curent system uses a batch processing model and is 10x
faster. I'd very much like to see a third generation re-design.
Did you see Glenn's recent message. A Mk IV system will generate
more then 1000 observations per minute. We have six Mk IV systems.
I have said this many times. IMO we will be forced to move to
a "continous process" data processing model. Data will have to be
proceessed as it comes off the camera and put into the database
frequently.
> If there is another ID (or combination of fields) in the Obs table
> that links each observation back to an original image, and a camera has a
> bad hair day, or even develops a serious systematic error, it is easier to
> pull the bad observations if no calculated fields are involved. Otherwise
> each mean, for instance, must be be recalculated. OK. Storing calculated
> values can and is done. But it is still not advisable.
You are correct. but it is a matter of paying for the calculation
up front like we do or paying for it each time you query the database.
We do many "selects" on the mean and standard deviations. In fact searches
of the computed values are the primary method most people use to
access the data.
I would hate
to compute these on the fly. Another advantage of computed values in
the table is that you can build a b-tree index. Queries then go at
"log(n)" speed rather then at "n" speed which is a big deal when
n equals about ten million.
> It is an extra step
> that can be overlooked, _particularly_ ten years later. It is also less
> obvious when calculated values are means, because the values are in the same
> units. Also truly necessary calculated result tables, such as Tenxcat, could
> be corrected using less steps.
>
> In passing, it seems obvious to me that a TASS ID should be generated and
> used, if only internally. To run matches table-to-table using floating point
> values, would likely create more recognition problems than we already have.
>
> On 9 Sep 99 Arne suggested "retain the seed coordinates for future matching,
> but have two additional columns of TASS RA & Dec for comparision". In which
> table? The index table? A follow-on question then is: which Tass position
> gets stored in the index table? The latest mean?'
I think what we could do is make a third table called "tassM16" or maybe
"GSC" or whatever catalog we use. This table would be static, never
changes.
>
> Is a complete TASS database schema published somewhere?
Yes. You can download the "tass database software" from my web site.
This has the source code that built the database with comments.
Unpack the file and go the the "SQL" directory.
I wrote this stuff and am also it's biggest critic it seems
Also you can log into the live database and examine the tables directly.
You may have seen my last proposel for a second experimental tass
database. That's becaus I think we can do better but I also think
stability is importoent. Lets keep the current system until we have
a better one.
--
Chris Albertson
calbertson@logicon.com Voice: 626-351-0089 X127
Logicon, Pasadena California Fax: 626-351-0699