Supervised Multi-View Canonical Correlation Analysis (sMVCCA): Integrating histologic and proteomic features for predicting recurrent prostate cancer.

TitleSupervised Multi-View Canonical Correlation Analysis (sMVCCA): Integrating histologic and proteomic features for predicting recurrent prostate cancer.
Publication TypeJournal Article
Year of Publication2014
AuthorsLee, G, Singanamalli A, Wang H, Feldman M, Master S, Shih N, Spangler E, Rebbeck T, Tomaszewski J, Madabhushi A
JournalIEEE transactions on medical imaging
Date Published2014 Sep 5

In this work, we present a new methodology to facilitate prediction of recurrent prostate cancer (CaP) following radical prostatectomy (RP) via the integration of quantitative image features and protein expression extracted from the excised prostate. Creating a fused predictor from big data streams comprised of thousands of dimensions is challenged through at least two constraints. Firstly, the classifier must account for the 'curse of dimensionality' problem, which hinders classifier performance when the number of features is much larger than the number of patient studies. Secondly, the classifier must be able to balance the possible mismatch in the number of features for the different big data channels to avoid biasing the classifier towards channels with larger numbers of features. In this paper, we present a new data integration methodology, supervised Multiview Canonical Correlation Analysis (sMVCCA), which aims to integrate infinite views of high dimensional data to provide a more amenable data representation for classification of disease. We also explore a version of sMVCCA using Spearman's rank correlation which, unlike Pearson's correlation, can account for non-linear correlations and outliers. A cohort of 40 prostate cancer patients with pathological Gleason scores 6-8 were considered for this study. 21 of these men were found to have biochemical recurrence (BCR) following RP, while 19 did not. The sMVCCA classifier combined a total of 189 quantitative histomorphometric attributes describing glandular morphology, architecture, and orientation in addition to the expression levels of 650 proteins extracted from the site of the tumor for each of the patients. The fused histomorphometric and proteomic representation via sMVCCA combined with a random forest classifier was able to predict BCR with a mean area under the receiver operating characteristic curve (AUC) of 0.74 across all 400 classifications and a maximum AUC of 0.9286. We found sMVCCA to perform statistically significantly (p < 0:05) better than comparative state of the art data fusion strategies such as Principal Component Analysis (PCA), multi-view CCA (MVCCA), and supervised regularized CCA (SRCCA) for predicting BCR. Furthermore, Kaplan-Meier survival analysis demonstrated improved logrank p-values for the sMVCCA fused classifier as compared to histology or proteomic features alone.

PDF Link

Alternate JournalIEEE Trans Med Imaging

 *IEEE COPYRIGHT NOTICE: 1997 IEEE. * Personal use of this material is permitted. However, permission to reprint/ republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

*COPYRIGHT NOTICE:* These materials are presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.