[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Handling Large Quantities Of Data



FYI, 
I'll try to get some more information after the presentation and I'm
also trying to get this guy involved with TASS data.

Presentation, 10:00 am, 1/9/98, 7-257
Bayesian Automatic Classification of Data with Matlab
One relatively new problem faced by scientists and instrument designers
is handling large quantities of data. "Too much data" is not a problem
that a scientist expects to face, but as instruments become more
powerful and the number of instruments per observation platform
increase, it is likely to be a problem that many scientists encounter.
One solution is to automate the process of finding patterns in the data,
so that only these patterns are transmitted and stored. This would
reduce storage, access and transmission requirements and also the human
effort required to produce understanding out of data (the majority of
which are uninteresting). Automatic pattern finding has significant
advantages over data compression (another method of reducing
transmission and storage requirements), which does not aid the analyst
in finding significant patterns in the data, or reduce the storage
requirements for the analysis environment. 
Andrew Love has implemented (in MATLAB) an algorithm called "Bayesian
Classification" (first described in "AutoClass: A Bayesian
Classification System," Proceedings of the Fifth International
Conference On Machine Learning," pp. 54-64.) which can classify
multivariate data into categories, each of which is a multivariate
normal distribution. The best number of categories is determined using
the Akaike Information Criterion (AIC). His paper presents details of
the implementation and test results using simulated data. 

For more information on this subject, see Andy Love (phone 410-792-5000
Ext. 8568, loveAE1@central.SSD.JHUAPL.EDU). 
.