|
|
Visualization and Data Mining for the Detection of Cancer
W. Zhu, R. Bennett, J. Kovach, and M. McGuigan
The statistical problem of cancer detection is addressed using
diagnostic data, that is, blood samples from patients with known diagnosis,
having ovarian cancer or normal [1,2]. When statistical tests are applied to
the ovarian cancer data, accuracy in the range of 80-95% results. More
recently, we have applied the same technology for the detection of head and
neck cancer. Again, accuracy in the range of 80-95% results [3].
We are developing an open source toolkit for NIH/FDA/NCI to assist with the
processing and analysis of datasets derived from proteomic experiments
[4,5,6]. Open source software provides a mechanism for leveraging existing
toolkits, sharing expertise and accelerating development. Our toolkit
utilizes components from several existing open source projects. These
include the visualization toolkit (VTK) and the machine learning in C++
library (MLC++). It will initially be customized for serum proteomic pattern
diagnostics involving the monitoring for ovarian cancer recurrence.
Figure
1 shows a Splat Visualization rendering of serum proteomic data from 22
patients following processing by a high-resolution mass spectrometer. The
majority of the spectra are from patients with known ovarian cancer. The
Splat Visualizer tool is an aggregate 3-D plotter that allows rapid
manipulation and viewing of large datasets. The tilted x-axis (labeled Mass)
represents the mass values of the peptides and protein fragments. The y-axis
(labeled Amp) shows normalized amplitude values (relative measure of
abundance). The z-axis (labeled Patients) has been used to stack individual
patient spectra, thus allowing comparison. The graphic has also been
automatically colored as a function of support (number of data points behind
the voxel), white and blue having the least amount of support, and red the
most. Therefore, the patient’s unique proteomic signature can be compared
and contrasted with known controls and calibrators. Figure 2 displays output
of the Multi-view visualization tool. This tool can be thought of as a
superset of the Splat Visualizer, since it allows for either synchronous or
asynchronous coordination of multiple windows, all containing different
proteomic datasets. In all four views the x and y-axes are defined as in
Figure 2. However, each view employs a different clinical parameter in the
z-axis, allowing simultaneous proteomic pattern viewing, but in that
window’s particular context.
Figure 1: Splat visualization displaying
Serum proteomic patterns from 22 patients with ovarian cancer.

Figure 2 displays output of the Multi-view visualization tool. This tool can
be thought of as a superset of the Splat Visualizer, since it allows
for either synchronous or asynchronous coordination of multiple windows, all
containing different proteomic datasets. In all four views the x and y-axes
are defined as in Figure 2. However, each view employs a different clinical
parameter in the z-axis, allowing simultaneous proteomic pattern viewing,
but in that window’s particular context.
In each window, serum proteomic spectra from approximately 100 patients with
known ovarian cancer have been aggregated and grouped according to a
clinical parameter of interest. Beginning top left and moving clockwise:
spectra are rendered as a function of the patient’s age, next spectra are
grouped and displayed according tumor grade, next by a lab value [Cancer
Antigen 125 (CA 125)], and finally by stage of disease. Any window can be
maximized and subsequent operations performed, such as: zoom, pan, rotate,
drill, or execution of a defined command.
Figure 2 (right): Multi-view
visualization of aggregate proteomic patterns (approximately 100 patients,
most with ovarian cancer)
viewed as a function of a clinical parameter.
References
-
[1] Petricoin III, E.F., Ardekani, A.M., Hitt, B.A., Levine, P.J.,
Rusaro, V.A., Steinberg, S.M, Mills, G.B., Simone, C., Fishman, D.A.,
Kohn, E.C., and Liotta, L.A. Lancet 359: 572-577 (2000) and Lancet 360:
169-171 (2002).
-
[2] Zhu, W., Wang, X., Ma, Y., Rao, M., Glimm, J., and Kovach, J.
Detection of cancer specific markers amidst massive mass spectral data.
Proc. Nat. Acad. Sci. 100: 14666-14671 (2003).
-
[3] Wang, X., Zhu, W., Pradhan, P., Ji, C., Ma, Y., Semmes, J., Glimm,
J., and Mitchell, J. Feature extraction in the analysis of proteomic
mass spectra. Proteomics 6: 2095-2100 (2006).
-
[4] Johann, D.J., McGuigan, M.D., Tomov, S., Fusaro, V.A., Ross, S.,
Conrads, T.P., Veenstra, T.D., Fishman, D.A., Whiteley, G.R., Petricoin,
E.F., and Liotta, L.A. Novel approaches to visualization and data mining
reveal diagnostic information in the low amplitude region of serum mass
spectra from ovarian cancer patients. Disease Markers 19: 197-207
(2004).
-
[5] Johann, D.J., McGuigan, M.D., Tomov, S., Blum, E., Whiteley, G.R.,
Petricoin, E.F., and Liotta, L.A. Toward a Systems Biology Software
Toolkit, 17th IEEE Symposium on Computer-Based Medical Systems, June
2004.
-
[6] Johann, D.J., McGuigan, M.D., Patel, A.R., Tomov, S., Ross, S.,
Conrads, T.P., Veenstra, T.D., Fishman, D.A., Whiteley, G.R., Petricoin,
E.F., and Liotta, L.A. Clinical proteomics and biomarker discovery.
Annals of the New York Academy of Sciences 1022: 295-306 (June 2004).
-
[7] Wang, X., Zhu, W., Pradhan, K., Ji, C., Ma, Y., Semmes, O.J., Glimm,
J., and Mitchell, J. Feature extraction in the analysis of proteomic
mass spectra. Proteomics 6(7): 2095-2100 (2006).

Last Modified: January 31, 2008 Please forward all questions about this site to:
Claire Lamberti
|