[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Mining Data



I have been working on a program to extract variables from the data 
accumulated so far.  It would appear that it will extract a top 1000 or 
so.  So far it has produced 100% good data.  Everything that comes out 
really looks variable.  The program starts with a WS output list and 
then looks at the data for the candidates.  I throw out the top two and 
bottom two measurements, compute the sigma of what is left, and compare 
it to the expected sigma for that magnitude.  This really removes the 
bad days and the "one high or one low" data.  It runs pretty slow.  OK, 
I know that this is properly done in the data base.  I will always have 
a little more data and I plan to take advantage of my position. :-)

In working on this I notice that Doug Welch has put an optional bad data 
day table in the WS program.  Michael S. have you considered using this 
as the way to get around the problem of leaving "bad" data in the data 
base?  I will try this myself once I work through the present scheme. 
 If it works, you might put up a top 1000 or so.  

I think it is significant that this is already in the program.  It must 
be a common problem.

I have a little more work to do and then I will have a data disk for 
June.  I think it will be too big even after compression to send.  But I 
am willing to try.  Michael S. you might suggest how to do it.  I would 
send it from a linux system.  Just tell me what tool to fire up and what 
to do.  A good project for overnight, I think.

Tom Droege