[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Introduction and some comments ;-)
Greetings all,
This is my first post to the list, so let me begin with a brief (!) introduction. My name is Mark Pitts, and I work for Shands HealthCare at the University of Florida. Shands is a regional health care system, including two teaching hospitals and a variety of community hospitals and clinics within our region. I lead the team responsible for the financial systems for the entire enterprise. I went to engineering school at NMSU (EE) where I concentrated in computer engineering (mainly digital logic design). I chose software engineering as a career path, and I have been working in the field ever since (almost 20 years). I recently went back to school at UF, earned an undergraduate degree in accounting, continued to graduate school, and now I am also a certified accountant. So what am I doing here?
I'm on this list for a number of reasons. Above all, I love astronomy. I've been an amateur observer for many years...nothing sophisticated, I just enjoy looking at the sky and learning as much as I can. But that's not enough for me. TASS intrigues me because it is an opportunity to contribute to some real, publishable science, and also because I want to learn the nuts and bolts of CCD photometry and image analysis.
Applying computer science to real world problems is my area of expertise. I've been drafted into helping on a couple of research projects (in the areas of molecular genetics and virology) as favors to friends, so I'm no stranger to issues of dealing with raw data and doing something useful to it. Hopefully I can help here, also.
Enough about me! On to business:
I just have a couple of comments for now (feel free to correct me if I've misunderstood an aspect of the project). First, according to my rough estimates, by the end of the year Tom will have almost a terabyte of data scattered across a stack of CD's sitting in his closet (1375 CD's * 650MB = approx. 894 GB of data. And that's just at Tom's site. If everybody else with a camera has been (or will be) as busy, that's a lot of raw data. It's data that needs to be normalized across time, accounting for any number of potential issues, including variable observing conditions and equipment idosyncrasies, such that a reliable analysis can be made of the same objects over time.
In the group discussion so far, the main objective seems to be to reduce the raw FITS images through a pipeline to star lists that can then be analyzed for variability over time. The reduction may indeed need to be an iterative process, in which we reduce the same data multiple times as we incorporate changes into the reduction pipeline based on what we learn from each pass. I can envision all sorts of different ways we might want/need to analyze and process the raw data and the intermediate results.
Really making something like this happen is going to take some planning and testing. I usually follow the "fast prototype" method for a problem like this. I would typically start by developing a fast spec for both the input and output datasets and their constituent elements, to include a description of the required transformations and required error-handling. Based on that, I would carefully select a representative subset of the data that would provide a test for each of the foreseeable issues that need to be handled. For example, if you need to deal with a random cosmic ray, your test dataset would include images in need of such handling. If you need to correct for instrument misalignment, you would include images resulting from a misalignment, etc. etc.
I would then design a basic data dictionary, create a sample database containing the test data, and then begin the iterative process of coding and testing to the prototype spec. I could go on, but you get the idea (and this post is getting WAY long).
Bottom line: the first thing you need is a project plan. With a good plan, we could farm out the work to those most qualified to perform each task. Collaboratively, we could get the data reduction flowing toward some usable results.
Thoughts?
Mark