Visualization and Data Mining for the Detection of Cancer
W. Zhu, R. Bennett, J. Kovach, and M. McGuigan

The statistical problem of cancer detection is addressed using diagnostic data, that is, blood samples from patients with known diagnosis, having ovarian cancer or normal [1,2]. When statistical tests are applied to the ovarian cancer data, accuracy in the range of 80-95% results. More recently, we have applied the same technology for the detection of head and neck cancer. Again, accuracy in the range of 80-95% results [3].

We are developing an open source toolkit for NIH/FDA/NCI to assist with the processing and analysis of datasets derived from proteomic experiments [4,5,6]. Open source software provides a mechanism for leveraging existing toolkits, sharing expertise and accelerating development. Our toolkit utilizes components from several existing open source projects. These include the visualization toolkit (VTK) and the machine learning in C++ library (MLC++). It will initially be customized for serum proteomic pattern diagnostics involving the monitoring for ovarian cancer recurrence.

Figure 2:  Splat visualization displaying Serum proteomic p atterns from 22 patients with ovarian cancer.  Click to enlarge image...Figure 1 shows a Splat Visualization rendering of serum proteomic data from 22 patients following processing by a high-resolution mass spectrometer. The majority of the spectra are from patients with known ovarian cancer. The Splat Visualizer tool is an aggregate 3-D plotter that allows rapid manipulation and viewing of large datasets. The tilted x-axis (labeled Mass) represents the mass values of the peptides and protein fragments. The y-axis (labeled Amp) shows normalized amplitude values (relative measure of abundance). The z-axis (labeled Patients) has been used to stack individual patient spectra, thus allowing comparison. The graphic has also been automatically colored as a function of support (number of data points behind the voxel), white and blue having the least amount of support, and red the most. Therefore, the patient’s unique proteomic signature can be compared and contrasted with known controls and calibrators. Figure 2 displays output of the Multi-view visualization tool. This tool can be thought of as a superset of the Splat Visualizer, since it allows for either synchronous or asynchronous coordination of multiple windows, all containing different proteomic datasets. In all four views the x and y-axes are defined as in Figure 2. However, each view employs a different clinical parameter in the z-axis, allowing simultaneous proteomic pattern viewing, but in that window’s particular context.

Figure 1:  Splat visualization displaying Serum proteomic patterns from 22 patients with ovarian cancer.

Figure 3:  Multi-view visualization of aggregate proteomic patterns...Click to enlarge image.
Figure 2 displays output of the Multi-view visualization tool. This tool can be thought of as a superset of the Splat Visualizer, since it allows for either synchronous or asynchronous coordination of multiple windows, all containing different proteomic datasets. In all four views the x and y-axes are defined as in Figure 2. However, each view employs a different clinical parameter in the z-axis, allowing simultaneous proteomic pattern viewing, but in that window’s particular context.

In each window, serum proteomic spectra from approximately 100 patients with known ovarian cancer have been aggregated and grouped according to a clinical parameter of interest. Beginning top left and moving clockwise: spectra are rendered as a function of the patient’s age, next spectra are grouped and displayed according tumor grade, next by a lab value [Cancer Antigen 125 (CA 125)], and finally by stage of disease. Any window can be maximized and subsequent operations performed, such as: zoom, pan, rotate, drill, or execution of a defined command.

Figure 2 (right):  Multi-view visualization of aggregate proteomic patterns (approximately 100 patients, most with ovarian cancer)
viewed as a function of a clinical parameter.

 

References

  • [1] Petricoin III, E.F., Ardekani, A.M., Hitt, B.A., Levine, P.J., Rusaro, V.A., Steinberg, S.M, Mills, G.B., Simone, C., Fishman, D.A., Kohn, E.C., and Liotta, L.A. Lancet 359: 572-577 (2000) and Lancet 360: 169-171 (2002).
  • [2] Zhu, W., Wang, X., Ma, Y., Rao, M., Glimm, J., and Kovach, J. Detection of cancer specific markers amidst massive mass spectral data. Proc. Nat. Acad. Sci. 100: 14666-14671 (2003).
  • [3] Wang, X., Zhu, W., Pradhan, P., Ji, C., Ma, Y., Semmes, J., Glimm, J., and Mitchell, J. Feature extraction in the analysis of proteomic mass spectra. Proteomics 6: 2095-2100 (2006).
  • [4] Johann, D.J., McGuigan, M.D., Tomov, S., Fusaro, V.A., Ross, S., Conrads, T.P., Veenstra, T.D., Fishman, D.A., Whiteley, G.R., Petricoin, E.F., and Liotta, L.A. Novel approaches to visualization and data mining reveal diagnostic information in the low amplitude region of serum mass spectra from ovarian cancer patients. Disease Markers 19: 197-207 (2004).
  • [5] Johann, D.J., McGuigan, M.D., Tomov, S., Blum, E., Whiteley, G.R., Petricoin, E.F., and Liotta, L.A. Toward a Systems Biology Software Toolkit, 17th IEEE Symposium on Computer-Based Medical Systems, June 2004.
  • [6] Johann, D.J., McGuigan, M.D., Patel, A.R., Tomov, S., Ross, S., Conrads, T.P., Veenstra, T.D., Fishman, D.A., Whiteley, G.R., Petricoin, E.F., and Liotta, L.A. Clinical proteomics and biomarker discovery. Annals of the New York Academy of Sciences 1022: 295-306 (June 2004).
  • [7] Wang, X., Zhu, W., Pradhan, K., Ji, C., Ma, Y., Semmes, O.J., Glimm, J., and Mitchell, J. Feature extraction in the analysis of proteomic mass spectra. Proteomics 6(7): 2095-2100 (2006).

Top of Page

Last Modified: January 31, 2008
Please forward all questions about this site to: Claire Lamberti