|
|
Atmospheric Aerosols
W. Zhu, K. Mueller, S. Schwartz
The purpose of this project is to develop data processing tools for the
classification and time series analysis of atmospheric aerosol particles.
Atmospheric aerosols play an important role in both climate and public
health. Current research is focused on determining the chemical composition
of aerosols, and thus the composition of air and its evolution over time.
Massive mass spectra data, on the scale of terabytes per week, have been
collected continuously over time by researchers in the Atmospheric Sciences
Division at Brookhaven, using their cutting-edge, field deployable,
high-precision mass spectrometer. The first task is to determine the
chemical composition of each aerosol based on its mass spectrum. To achieve
this, we classify aerosols by their chemical compositions and thus determine
the air composition at each time point. Finally, we study the time series
evolution of atmospheric compositions identifying normal as well as abnormal
patterns in real time.
To convey the analysis results to the scientists in an intuitive way and
to incorporate their expert knowledge into the data analysis phase, an
interactive graphical user interface utilizing modern scientific
visualization techniques becomes a necessity. Large-scale BNL data
collections have reached sizes for which straightforward visualization
techniques are starting to fail. In this spirit, our proposed approach
couples a powerful Bayesian classification and multivariate time series
analysis engine with an intuitive and responsive graphical user interface to
fine-tune the underlying models. We call this application SpectrumMiner, a
significant component of a more general visual data mining framework and
application, called ViStA, which we are currently developing at BNL [1,2,3].
The statistics engine. We first classify aerosols based on their
mass spectra through an iteration of expert-machine interaction using a
Bayesian classifier. The Bayesian classifier can incorporate not only
explicit classification rules but also prior knowledge in terms of a partial
training data set. We have also been implementing automatic procedures to
elucidate the structure of the compounds. To study the evolution of aerosols
and the changes in atmospheric composition versus time, both the univariate
and the multivariate time series analyses are employed to unravel trends and
patterns.
The traditional statistical classifier would take only explicit
classification rules. Machine learning techniques such as the neural network
would take only a training set. However, the semi-supervised clustering
approach we are taking requires the classifier to learn from both the
explicit rules and the implicit rules embedded in a partial training set
established by the experts as they survey the current clustering results and
make adjustments to the clusters. For this we have implemented an
interactive Bayesian classification framework that could absorb the
up-to-date prior information in both the explicit and implicit formats, and
produce the updated posterior classification results.
For efficient classification and automatic structure elucidation, we have
been constructing a molecule library where the signature profile and class
membership of each molecule is established. The molecules are classified
along a natural chemical classification tree with two categories--organic
and inorganic--at the initial node. Subsequently, the organics are further
divided into classes of carboxylic acids, aldehydes, ketones, alkenes,
alkanes, aromatics, etc. We begin with the spectra of known molecules (NIST
library or lab-generated).
Visualization. The statistical analysis engine is combined with a
highly visual interface to facilitate interactive exploration, mining,
classification and survey of these large, high-dimensional data collections.
In order to empower scientists to control and fine-tune the mining
and
classification process in an intuitive and interactive way, SpectrumMiner’s
hierarchical classification algorithm is user-steerable via a novel
multi-modal visual interface. An important component of this interface is
the interactive dendrogram, where hierarchy nodes are placed on
concentric circles whose radii are determined by the dissimilarity of the
node’s sub-tree. We chose a circular layout of the dendrogram since it makes
better use of space than its linear counterpart. It inherent ly dedicates
less drawing space to the higher-level, less numerous nodes, and distributes
more space to the many leaf nodes along the circumference of the circle. See
Figure 1, where we show a screen capture of SpectrumMiner, with the
interactive dendrogram located on the bottom right. Edges are colored using
a rainbow colormap to indicate the number of data items they carry. To the
left of the dendrogram is the node viewer. Selecting a particular node will
display the average spectrum of all data items classified into the node
(window with white background), as well as the node’s data composition
(window with blue background just below). We are currently incorporating the
classification steering capabilities into our system. By inspecting the
present classification in the node viewer and the dendrogram, scientists may
decide that some particles, or the entire node, have been misclassified. To
correct this error, they then simply drag the concerned particles or node
into the proper location within the hierarchy, which subsequently triggers a
refinement of the classification rules.
References
-
[1] Imrich, P., Mueller, K., Imre, D., Zelenyuk, A., and Zhu, W. A
Hardware- Accelerated Rubbersheet Focus + Context Technique for Radial
Dendrograms. IEEE Information Visualization Symposium’03; Seattle,
October 2003.
-
[2] Imrich, P., Mueller, K., Imre, D., Zelenyuk, A., and Zhu, W. 3D
ThemeRiver. IEEE Information Visualization Symposium’03; Seattle,
October 2003.
-
[3] Yoon, C. and McGraw, R. Representation of generally mixed
multivariate aerosols by the quadrature method of moments: I.
Statistical foundation. J. Aerosol Sci. 35: 561-576 (2004).

Last Modified: January 31, 2008 Please forward all questions about this site to:
Claire Lamberti
|