[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: new tech note




Hey Chris,

Thanks for taking the time to review and comment.  One thing I wanted to do was stay above implementation details.  I think what I'll do is add a section for implementation suggestions.  I had envisioned linking each table with foreign keys, and having cascading rules on so you could delete a night and have all the associated data removed also.  I like to do it the right (my? :-)) way before I implement for speed.

>
> OK, some comments.  I went through this procedure, asking for
> comments, for the Mk III database.  Got lots of good comments.
> I'll pass them on plus some things I learned.  I used Postgresql
> also.  Seems everyone here like it.
>
> 1) The orientation of the chip "belongs" to an __image__ not a
> telescope.  Telescops are likely to be modified, fixed, moved

While I agree, I don't believe that the data would change for a night, so I'd be more likely to keep it in a separate grouping.  This seems to be more of an implementation detail though, as long as the information is stored.  Personally, I tend to normalize the data.

> or whatever.  Also IMO the best way to descibe the orientation
> is to copy the data direct from the FITS header.

I just look at 1 picture, hira2246788.fits, and could not find chip orientation data, but I may not know what I'm looking for :-)  What info is this, or are we not currently putting this in the header?

> The other option is to assign each telescope a new serial number
> whenever some part is swapped out,  You'll never keep track.

Not a serial number, but a new entry into one table (telescope), linked to the next (detector).

> Best to just remember the FITS keywords from the image.
> The pipeline __should__ (must?) pass the image header through
> so that it shows up in the starlist header.

How much though.  For instance, do we keep the ra/dec in the header, even when it's a suggestion of where the telescope is thought to be pointed?  In general though, I agree that the fits header info should be in the dB in some form or another.

>
> 2) Again, where you say "for each detector".  It will be easyer to
> keep this data for each image.  (quick calculation:  10 year

Why to you say easier?  And, that normalization thing pops up again.  I'd rather keep 300 copies of the information (if the detector info changed nightly), rather than 6 million copies.  That will save lots of space (maybe at to large of a computational penalty for the extra joins though)..

> project lifetime, 20 cameras, 300 nights per year, 100 images
> per night = 6 million images = no big deal.  6M row tables are
> easy and we will not be at 6M for ten years.
>
> 3) Again, in place of "for each night" use "for each frame".  I'd
> keep the information on processing along with the dataset.
> There is always the option of deleting data, re-proceesing it and
> adding it back in.  Say we find a bug in version 2.3 or the
> data reduction software.  We will want to delete all data proceesed
> by the software.

I'd go the normalizing route again.

>
> So, given my comments above.  I'd have a table called "FRAMES"
> that lists eveything we know about a frame.  Which telescope
> was used, what software was used, clouds cover, the filter
> the image scale and so on.  Also each frame is given an ID.
>
> Now, where you say "for each star", I think you want to say
> "for each observation of an object"  because we will likely

I need to spell this out a little better in the TN.  While I wasn't showing a schema, I was implicitly linking each "For each" to the previous one, so each "Star" would be linked to one "Image file", linked to one "night", etc...

> see the same star many times.  OK, next I think you need a
> "FRAME ID" to be stored along with each row.  Now you can trace
> every observation back to the camera and FITS image file.

Implementation section - soon to come.

>
> Not done yet.
>
> Next we need a table called "stars" or "catalog" or "index".
> This has one row for ever object TASS has ever seen.  (That
> is, one row for each physical star.)  We assign a arbitrary
> ID to each start (use 1, 2, 3, or a name made up from the
> location.  But I like 1, 2, 3 because the exact location will
> change as we average more observations or as we "split"
> dobles.) Call it "TASS Catalog ID".

Yup, I gottcha there.

>
> Going back to the table called "for each observation of an
> object"  We need to assign a "TASS Catalog ID" to each
> observation.  Now the "catalog" will contain summary statistics
> for each star while the "observations" table contains the
> data points that make up a light curve.

Yup, implementation, or maybe a derived data section.

>
> ++++++
>
> The algorithm I used to process a star list first added one row
> to the "frames" table, just copy the file header.  next for
> each row in the starlist I added it to the big "observation"
> table.  Next I had to asign a "TASS Catalog ID".  So I looked
> in the catalog for a close match.  If none was found I figured
> this was the first time this star was seen an added a new
> catalog entry.  If I found a match I'd recompute the summary
> statistics for that star.

Similar to what I'm doing now.  I update the star_id in the observation table, and the trigger on that table will update the statistics for that linked star.

>
> The above was not easy to implement and I still don't like the
> way I did it.  A run could take many hours.  I would make a
> frame a "transaction" in DBMS terms.

Hmm...  The imports I do have are transactioned.  When I process a night (or many nights), I find every observation which isn't matched to a star.  Ah heck.  I've attached the perl/sql script I currently use.  Amazingly, I actually commented this one as I went along.  Hopefully, it's readable enough to get the general idea.  I transaction each set of observations which update or create a star.  Ignore the CVS comments in the header, they came from a bad commit and I have not removed them yet...

>
> I also had a set of "standard" catalogs.  Get as many as you
> want.  Tycho, variable star catalogs and so on.  The more the
> better.  Now for each catalog you make a "Cross ID Table"
> that has two colums listing matches between the Tycho ID and
> the TASS Catalog ID.

Not doing this one (yet?).

>
> With the above you can see that with one SQL query I could
> quicky make a light curve for any object I could name or even
> querry for stars with a summary statistic within some range.
> like "SELECT ... where ColorIndex between 2.0 and 3.0;"

Again, thanks for the input.  New revision coming sometime...  Tom also indicated (and I agree) that a lot of data from his ADC system should be added, although I'm thinking that this would be in a completely separate and unlinked (except for site, which I've forgotten to add) as the reading of these values can (and should?) take place independently of actually using the CCD's.

I've more work to do, that's for sure.

Thanks,
Rob
#!/usr/bin/perl
#
# $Name:  $
#
# $Log: mergeStars.pl,v $
# Revision 1.2  2001/12/11 06:01:09  robc
# Add CVS header data and log from last commit
#
# Add compile time ability to do mean, median or inter quartile mean
# Add setting of bscale/bzero for SHORT_IMG so values are always positive
# 'Fix' asci column to accept all file names used in createion of flat
# Remove file before writing
# General clean up
#
# Revision 1.1  2001/11/23 16:11:29  robc
# First version
#

use warnings;
use strict;
use IO;
use DBI;

use constant ID => q$Id: mergeStars.pl,v 1.2 2001/12/11 06:01:09 robc Exp $;
use constant MIN_STARS => 11;
use constant VACUUM => 10000;
my $min_smag = .1;
my $radius = 0.004;

STDOUT->autoflush;

my $version = join ( ' ', ( split ( ' ', ID ) )[ 1 .. 3 ] );
$version =~ s/,v\b//;
$version =~ s/(\S+)$/($1)/;

my $file = "/tass/src/tassiv_reduce/TASSIV.data";
unless ( my $return = do $file )
   {
   die "Couldn't parse $file: $@" if $@;
   die "Couldn't do $file: $!"  unless defined $return;
   die "Couldn't run $file: $!" unless $return;
   }

my $dbh =
   DBI->connect( "DBI:Pg:dbname=tassiv", "", "",
                 { RaiseError => 1, AutoCommit => 0 } );

sub done;
$SIG{ INT } = \&done;

my $vacuum = <<__VACUUM__;

VACUUM ANALYZE comp_loc;
VACUUM ANALYZE comp_u;
VACUUM ANALYZE comp_b;
VACUUM ANALYZE comp_v;
VACUUM ANALYZE comp_r;
VACUUM ANALYZE comp_i;
VACUUM ANALYZE observations;
__VACUUM__

#
# This one looks for stars in the observations which haven't matched yet but
# are within the radius of 'this' star
my $sql = <<__END_FIND_MATCHING_STARS__;

SELECT ra, decl, obs_id
FROM observations
WHERE circle_contain_pt(
         circle( point( ?, ? ), $radius ),
         point( ra, decl ) )
   AND ra   BETWEEN (? - $radius) AND (? + $radius)
   AND decl BETWEEN (? - $radius) AND (? + $radius)
	AND star_id ISNULL
	AND smag < $min_smag

__END_FIND_MATCHING_STARS__
my $sth_find_matching_stars = $dbh->prepare( $sql );

#
# This one looks into the frozen stars to make sure the 'current' star doesn't
# match an existing one
$sql = <<__END_FIND_EXISTING_STARS__;

SELECT star_id, ra, decl
FROM frozen_loc
WHERE circle_contain_pt(
         circle( point( ra, decl ), $radius ),
         point( ?, ? ) )
   AND ra   BETWEEN (? - $radius) AND (? + $radius)
   AND decl BETWEEN (? - $radius) AND (? + $radius)

__END_FIND_EXISTING_STARS__
my $sth_find_existing_stars = $dbh->prepare( $sql );
#
# Set up some inserts now...
my $sth_insert_comp_loc =
   $dbh->prepare( "INSERT INTO comp_loc( ra, decl ) VALUES( ?, ? )" );
my $sth_insert_comp_u =
   $dbh->prepare( "INSERT INTO comp_u( star_id ) VALUES( ? )" );
my $sth_insert_comp_b =
   $dbh->prepare( "INSERT INTO comp_b( star_id ) VALUES( ? )" );
my $sth_insert_comp_v =
   $dbh->prepare( "INSERT INTO comp_v( star_id ) VALUES( ? )" );
my $sth_insert_comp_r =
   $dbh->prepare( "INSERT INTO comp_r( star_id ) VALUES( ? )" );
my $sth_insert_comp_i =
   $dbh->prepare( "INSERT INTO comp_i( star_id ) VALUES( ? )" );

#
# and a select...
my $sth_star_id_select =
   $dbh->prepare( "SELECT currval('comp_loc_star_id_seq')" );

#
# and another select...
my $sth_ra_decl_select =
   $dbh->prepare( "SELECT star_id, ra, decl, count FROM comp_loc" .
	               " WHERE star_id = ?" );

#
# and another insert...
my $sth_frozen_loc_insert =
   $dbh->prepare( "INSERT INTO frozen_loc( star_id, ra, decl, count ) " .
	               "VALUES( ?, ?, ?, ? )" );

#
# and an update
my $sth_observation_update =
   $dbh->prepare( "UPDATE observations SET star_id = ? WHERE obs_id = ?" );

#
# and a delete
my $sth_welch_index_delete = 
  $dbh->prepare( "DELETE FROM welch_index WHERE star_id = ?" );

#
# First off, lets find all the stars which have no star_id
my $sth_null_stars = $dbh->prepare(
   "SELECT ra, decl, obs_id FROM observations WHERE star_id ISNULL".
	" AND smag < $min_smag" );
print "Finding unassigned stars: ";
$sth_null_stars->execute;

#
#  With the following bind, after each fetch, we'll end up with a hash of
#  the columns from the select.  If we add to the select, we don't have
#  to change the bind either...
my %null_select;
$sth_null_stars->bind_columns(
   \( @null_select{ @{ $sth_null_stars->{ NAME_lc } } } ) );

printf "found %d of them\n", $sth_null_stars->rows;

#
# This hash will keep track of obs_id's which have already been assigned
# to a star in this session, thus reducing the number of queries we need
my %seen;
my $new = 0;
my $new_total = 0;
my $update = 0;
my $many = 0;
my $not_enough = 0;
my $count = 0;
my $stats = 0;
my $last_vacuum = 0;
print "         ";
while( $sth_null_stars->fetch )
   {
   #
   # If we've seen it, remove it (obs_id's are unique, so we'll never see
   # it again), and continue with the next one
   if ( exists $seen{ $null_select{ obs_id } } )
      {
      delete $seen{ $null_select{ obs_id } };
      next;
      }

   my $ra = $null_select{ ra };
   my $dec = $null_select{ decl };
   #
   # If this star matches 1 existing one (from the frozen table),
   # then we can add this star to it's influence.  While we're at it,
	# lets go ahead and add all appropriate stars to it's influence
   $sth_find_existing_stars->execute( $ra, $dec, $ra, $ra, $dec, $dec );
   if ( $sth_find_existing_stars->rows == 1 )
      {
      #
		# Grab the id of what we're matching to
      my ( $star_id, $lra, $ldec ) = $sth_find_existing_stars->fetchrow_array;
		#
		# Since we're adding a star, delete the welch index so it'll be
		# calculated again
		$sth_welch_index_delete->execute( $star_id );
		#
		# Here, we'll find all the stars to update
      $sth_find_matching_stars->execute( $lra, $ldec, $lra, $lra, $ldec, $ldec );
      my %matching_select;
      $sth_find_matching_stars->bind_columns(
         \( @matching_select{
               @{ $sth_find_matching_stars->{ NAME_lc } } } ) );
      #
      # Now, go update all the stars which match.  This will fire the
      # trigger in the database to actually do the work of bringing the
      # data from the observations table into the comp_X tables
      while ( $sth_find_matching_stars->fetch )
         {
         #
         # If we've seen it, remove it (obs_id's are unique, so we'll never see
         # it again), and continue with the next one
         if ( exists $seen{ $matching_select{ obs_id } } )
            {
            delete $seen{ $matching_select{ obs_id } };
            next;
            }
         $seen{ $matching_select{ obs_id } } = 1;
         ++$count;
         ++$update;
         printf "\b\b\b\b\b\b\b\b\b%8.4f%%",
                100 * $count / $sth_null_stars->rows;
         $sth_observation_update->execute( $star_id,
                                           $matching_select{ obs_id } );
         }
		$dbh->commit;
      next;
      }
   #
   # If it matches more than one, then we don't update anything (yet)
   elsif ( $sth_find_existing_stars->rows > 1 )
      {
      $seen{ $null_select{ obs_id } } = 1;
      ++$count;
      ++$many;
      printf "\b\b\b\b\b\b\b\b\b%8.4f%%",
             100 * $count / $sth_null_stars->rows;
      next;
      }
   #
   # Here, we matched no existing stars, so we'll see if we can create
   # a new one, given we have enough stars matched.  Note that this is
   # total stars matching in all bands.
   $sth_find_matching_stars->execute( $ra, $dec, $ra, $ra, $dec, $dec );
   if ( $sth_find_matching_stars->rows >= MIN_STARS )
      {
      ++$new;
      my %matching_select;
      $sth_find_matching_stars->bind_columns(
         \( @matching_select{
               @{ $sth_find_matching_stars->{ NAME_lc } } } ) );
      #
      # Here, we create the entries in the comp_X tables so the update
      # trigger can do it's work
      $sth_insert_comp_loc->execute( $ra, $dec );
      $sth_star_id_select->execute;
      my $star_id = $sth_star_id_select->fetchrow_array;
      $sth_insert_comp_u->execute( $star_id );
      $sth_insert_comp_b->execute( $star_id );
      $sth_insert_comp_v->execute( $star_id );
      $sth_insert_comp_r->execute( $star_id );
      $sth_insert_comp_i->execute( $star_id );
      #
      # Now, go update all the stars which match.  This will fire the
      # trigger in the database to actually do the work of bringing the
      # data from the observations table into the comp_X tables
      while ( $sth_find_matching_stars->fetch )
         {
         #
         # If we've seen it, remove it (obs_id's are unique, so we'll never see
         # it again), and continue with the next one
         if ( exists $seen{ $matching_select{ obs_id } } )
            {
            delete $seen{ $matching_select{ obs_id } };
            next;
            }
         $seen{ $matching_select{ obs_id } } = 1;
         ++$new_total;
         ++$count;
         printf "\b\b\b\b\b\b\b\b\b%8.4f%%",
                100 * $count / $sth_null_stars->rows;
         $sth_observation_update->execute( $star_id,
                                           $matching_select{ obs_id } );
         }
      $sth_ra_decl_select->execute( $star_id );
		my @frozen_loc = $sth_ra_decl_select->fetchrow_array( );
      $sth_frozen_loc_insert->execute( @frozen_loc );
      $dbh->commit;
      }
   else
      {
      $seen{ $null_select{ obs_id } } = 1;
      ++$count;
      ++$not_enough;
      printf "\b\b\b\b\b\b\b\b\b%8.4f%%",
             100 * $count / $sth_null_stars->rows;
      }
   #
   # All done with this round
	if ( ($count - $last_vacuum) > VACUUM )
	   {
      $last_vacuum = $count;
		print " Vacuuming";
      local $dbh->{AutoCommit} = 1;
		$dbh->do( $vacuum );
		print "\b\b\b\b\b\b\b\b\b\b";
		print "          ";
		print "\b\b\b\b\b\b\b\b\b\b";
	   }
   }

{
	print " Vacuuming";
   local $dbh->{AutoCommit} = 1;
	$dbh->do( $vacuum );
	print "\b\b\b\b\b\b\b\b\b\b";
	print "          ";
	print "\b\b\b\b\b\b\b\b\b\b";
}

done( );
sub done
   {
   $sth_find_matching_stars->finish;
   $sth_find_existing_stars->finish;
	$sth_star_id_select->finish;
	$sth_ra_decl_select->finish;
   $sth_null_stars->finish;
   $dbh->disconnect;

   print "\nCreated $new new stars from $new_total\n";
   print "Didn't use $not_enough stars because there were't enough to make new ones\n";
   print "Updated $update stars\n";
   print "Couldn't update $many stars because they matched more than one\n";

	};