[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: data corruption




Chris (and all),

I just looked at CFITSIO, and indeed, there is a group of routines for
creating/checking checksums.  I can easily add these in (provided they work
with the windows version) if desired.  There would be two new keywords
added, DATASUM and CHECKSUM, with associated ASCII string of quoted digits.

Let me know.

Cheers,
Rob

Robert Creager
Senior Software Engineer
Client Server Library
303.673.2365 V
303.661.5379 F
888.912.4458 P
StorageTek
INFORMATION made POWERFUL



> -----Original Message-----
> From: Chris Albertson [mailto:chrisalbertson90278@yahoo.com]
> Sent: Friday, June 08, 2001 10:54 AM
> To: hjohnson@pluto.njcc.com; tass@listserv.wwa.com
> Subject: Re: data corruption
> 
> 
> I believe there is a convention for checksumming FITS files.  The
> checksum is written as a line in the header.  You can checksum the
> whole file of just the data or both.  I think in the software 
> I write I
> did both.  The cfitsio library makes this simple to do. Of course you
> need to be able to read the file in order to compute the 
> checksum.  The
> errors we are seeing are such that the files can't be read.  As herb
> says, CDROMS, Floppies and hard drives each sector has a kind of
> checksum encoded on it by the hardware.  When the drive reads the
> sector back it reports and error to the operating system which is
> typically reported back to the user as something like "Data Error -
> read failed".
> 
> I doubt a readable file would fail a FITS checksum test but still I
> like the belt and suspender method.
> 
> I think it is a good idea to use an external program to compute a
> checksum.   The best one of these out is "md5sum"  On UNIX or in a DOS
> box you can do "md5sum *.fits > sums.md5" and create a checksum file
> for all files in the directory.  Later on you can do "md5sum sums.md5"
> and check all of the files.  It takes a while to scan 600MB 
> of data but
> the method is close to fool proof.
> 
> With conventional polynomial checksums one can change the data such
> that the sum remains constant.  It is not too hard to do by 
> hand with a
> binary editor.  No one has yet been able to do this with MD5 because
> MD5 uses a "one way" or "trap door" cryptographic algorithm.
> 
> 
> --- Herbert R Johnson <hjohnson@pluto.njcc.com> wrote:
> > On Thu, 07 Jun 2001 22:39:09 -0500, Tom Droege
> > <tdroege@veriomail.com> wrote:
> > *>Does anyone have a good 18e?
> > *>
> > *>Tom Droege
> > 
> > Some TASS members may know that my business is supporting "very old"
> > personal computers: in particular computers built with S-100 bus
> > cards
> > manufactured between 1976 and the mid 1980's. I mention this because
> > in those times, reliability of hard drives, floppy drives, and other
> > media was questionable. The issue seems to have come up again. It's
> > still an issue whenever files are sent across networks, as 
> a transfer
> > can be interrupted.
> > 
> > The traditional way that one verified a file was to check it against
> > the original. This is not an option for TASS as the 
> "original" CD-ROM
> > is far away. At the least, a seperate list of filenames, sizes and
> > dates
> > could be added to a distribution CD: a "DIR" or "ls" command will
> > produce
> > such a file. However this will not verify CONTENTS, just length. But
> > it's better than nothing.
> > 
> > The traditional way to verify contents was to use a list of 
> files and
> > their checksums, and a program which would do one of the following:
> > read that list, read each file on the list, and compare a computed
> > checksum vs. the list; OR create a list of files and checksums. A
> > "checksum" is a single value that is computed by adding all 
> the bytes
> > of a file in a prescribed way to create a "unique" number. I put
> > unique in quotes because there is always a slight chance that two
> > files will have the same checksum, and a smaller chance that such
> > will occur in a single file due to corruption. Consequently, the
> > checksum
> > algorithm is usually some kind of polynomial expression; and the
> > checksum value is often a 32-bit value (setting the odds of
> > duplication
> > to one in 2**32).
> > 
> > (A side point: floppies, hard drives, and other media use checksums
> > at
> > the sector level for verification. This is how a storage 
> device knows
> > to report a read error.)
> > 
> > A review of comprehensive references of computer algorithms 
> will find
> > reasonable programming samples for checksum computation. Or a search
> > of
> > archives of shareware/freeware will find such programs. It's a smart
> > idea to include the programs themselves (and their source code for
> > any long-term archive) with a distribution of the files. In the case
> > of TASS, it's fair to say that its checksum program set will have to
> > run in Windows, Unix/Linux (x86 version and perhaps Sparc), and (if
> > you please) MS-DOS. I'd suggest the source be in C and (perhaps)
> > BASIC. Least effort may be for one person to grab appropriate C
> > source
> > and compile for all the above. I myself cannot do this.
> > 
> > IT IS IMPORTANT THAT ALL VERSIONS IN ALL OPERATING SYSTEMS PRODUCE
> > THE SAME
> > CHECKSUM PER FILE.
> > 
> > Finally, I note the following from my old computer 
> experience, AND my
> > prior
> > experience with radio astronomy data from 10 to 30 years ago. It's
> > actually
> > a blessing that this problem emerged NOW, so it could be solved;
> > rather
> > than 10 or 20 years from now, when the CD-ROM's started to degrade.
> > 
> > Herb Johnson
> > 
> > Herbert R. Johnson              http://pluto.njcc.com/~hjohnson
> > hjohnson@pluto.njcc.com         voice 609-771-1503, New Jersey USA
> >              amateur astronomer and telescope tinkerer
> >    reseller of classic Macs & accessories from Plus to PowerMac
> >    S-100 & 8-inch drive manuals and parts, call for "Dr. S-100"
> > 
> 
> 
> =====
> Chris Albertson
>   chrisalbertson90278@yahoo.com
>   Home: 310-376-1029
>   Cell: 310-990-7550
> 
> __________________________________________________
> Do You Yahoo!?
> Get personalized email addresses from Yahoo! Mail - only $35 
> a year!  http://personal.mail.yahoo.com/
>