Atmospheric Aerosols
 W. Zhu, K. Mueller, S. Schwartz

The purpose of this project is to develop data processing tools for the classification and time series analysis of atmospheric aerosol particles. Atmospheric aerosols play an important role in both climate and public health. Current research is focused on determining the chemical composition of aerosols, and thus the composition of air and its evolution over time. Massive mass spectra data, on the scale of terabytes per week, have been collected continuously over time by researchers in the Atmospheric Sciences Division at Brookhaven, using their cutting-edge, field deployable, high-precision mass spectrometer. The first task is to determine the chemical composition of each aerosol based on its mass spectrum. To achieve this, we classify aerosols by their chemical compositions and thus determine the air composition at each time point. Finally, we study the time series evolution of atmospheric compositions identifying normal as well as abnormal patterns in real time.

To convey the analysis results to the scientists in an intuitive way and to incorporate their expert knowledge into the data analysis phase, an interactive graphical user interface utilizing modern scientific visualization techniques becomes a necessity. Large-scale BNL data collections have reached sizes for which straightforward visualization techniques are starting to fail. In this spirit, our proposed approach couples a powerful Bayesian classification and multivariate time series analysis engine with an intuitive and responsive graphical user interface to fine-tune the underlying models. We call this application SpectrumMiner, a significant component of a more general visual data mining framework and application, called ViStA, which we are currently developing at BNL [1,2,3].

The statistics engine. We first classify aerosols based on their mass spectra through an iteration of expert-machine interaction using a Bayesian classifier. The Bayesian classifier can incorporate not only explicit classification rules but also prior knowledge in terms of a partial training data set. We have also been implementing automatic procedures to elucidate the structure of the compounds. To study the evolution of aerosols and the changes in atmospheric composition versus time, both the univariate and the multivariate time series analyses are employed to unravel trends and patterns.

The traditional statistical classifier would take only explicit classification rules. Machine learning techniques such as the neural network would take only a training set. However, the semi-supervised clustering approach we are taking requires the classifier to learn from both the explicit rules and the implicit rules embedded in a partial training set established by the experts as they survey the current clustering results and make adjustments to the clusters. For this we have implemented an interactive Bayesian classification framework that could absorb the up-to-date prior information in both the explicit and implicit formats, and produce the updated posterior classification results.

For efficient classification and automatic structure elucidation, we have been constructing a molecule library where the signature profile and class membership of each molecule is established. The molecules are classified along a natural chemical classification tree with two categories--organic and inorganic--at the initial node. Subsequently, the organics are further divided into classes of carboxylic acids, aldehydes, ketones, alkenes, alkanes, aromatics, etc. We begin with the spectra of known molecules (NIST library or lab-generated).

Visualization. The statistical analysis engine is combined with a highly visual interface to facilitate interactive exploration, mining, classification and survey of these large, high-dimensional data collections. In order to empower scientists to control and fine-tune the mining and classification process in an intuitive and interactive way, SpectrumMiner’s hierarchical classification algorithm is user-steerable via a novel multi-modal visual interface. An important component of this interface is the interactive dendrogram, where hierarchy nodes are placed on concentric circles whose radii are determined by the dissimilarity of the node’s sub-tree. We chose a circular layout of the dendrogram since it makes better use of space than its linear counterpart. It inherentFig. 1.  SpectrumMiner's interactive dendrogram interface.  Click to enlarge image.ly dedicates less drawing space to the higher-level, less numerous nodes, and distributes more space to the many leaf nodes along the circumference of the circle. See Figure 1, where we show a screen capture of SpectrumMiner, with the interactive dendrogram located on the bottom right. Edges are colored using a rainbow colormap to indicate the number of data items they carry. To the left of the dendrogram is the node viewer. Selecting a particular node will display the average spectrum of all data items classified into the node (window with white background), as well as the node’s data composition (window with blue background just below). We are currently incorporating the classification steering capabilities into our system. By inspecting the present classification in the node viewer and the dendrogram, scientists may decide that some particles, or the entire node, have been misclassified. To correct this error, they then simply drag the concerned particles or node into the proper location within the hierarchy, which subsequently triggers a refinement of the classification rules.

 

 

References

  • [1] Imrich, P., Mueller, K., Imre, D., Zelenyuk, A., and Zhu, W. A Hardware- Accelerated Rubbersheet Focus + Context Technique for Radial Dendrograms. IEEE Information Visualization Symposium’03; Seattle, October 2003.
  • [2] Imrich, P., Mueller, K., Imre, D., Zelenyuk, A., and Zhu, W. 3D ThemeRiver. IEEE Information Visualization Symposium’03; Seattle, October 2003.
  • [3] Yoon, C. and McGraw, R. Representation of generally mixed multivariate aerosols by the quadrature method of moments: I. Statistical foundation. J. Aerosol Sci. 35: 561-576 (2004).

 

Top of Page

Last Modified: January 31, 2008
Please forward all questions about this site to: Claire Lamberti