Machine Learning and Personalized Medicine

We have presented a number of different machine learning and computerized decision support schemes which have not only examined better, more accurate detection/diagnosis of disease, but also looked at radio-omics approaches to more deepy interrogate disease characteristics.

A radiohistomorphometric approach for identifying in vivo imaging parameters correlated with ex vivo histological biomarkers characterizing prostate tumors: The objective of this work is to discover imaging markers of aggressive prostate cancer with the motivation of reducing overdiagnosis and overtreatment.  Specifically, we seek to identify quantitative Dynamic Contrast Enhanced (DCE) MRI attributes that are predictive of Gleason grade through a novel computerized radiology-pathology correlation based framework called radiohistomorphometry.  The visualized cluster heatmap facilitates identification of imaging biomarkers by allowing for identifying correlations between each pair of ex vivo microvessel features and in vivo DCE MRI kinetic features. In this figure, red corresponds to high and blue corresponds to low correlation values and those enclosed in black boundary are statistically significant (p<0.05). The red subcluster identifies candidate DCE MRI markers of microvessel architecture.  


Variable Importance in Nonlinear Kernels (VINK): Classification of Digitized Histopathology: In this work we present a simple yet elegant method for approximating the mapping between the data in the original feature space and the transformed data in the kernel PCA (KPCA) embedding space; this mapping provides the basis for quantification of variable importance in nonlinear kernels (VINK). We show how VINK can be implemented in conjunction with the popular Isomap and Laplacian eigenmap algorithms. VINK is evaluated in the contexts of three different problems in digital pathology: (1) predicting five year PSA failure following radical prostatectomy, (2) predicting Oncotype DX recurrence risk scores for ER+ breast cancers, and (3) distinguishing good and poor outcome p16+ oropharyngeal tumors. We demonstrate that subsets of features identified by VINK provide similar or better classification or regression performance compared to the original high dimensional feature sets.

Supervised Multi-View Canonical Correlation Analysis: Fused Multimodal Predictors of Disease Prognosis: In this work, we introduce supervised multi-view canonical correlation analysis (sMVCCA), a novel data fusion method that attempts to find a common representation for multiscale, multimodal data where class separation is maximized while noise is minimized. Although this method can be applied to any number of modalities, we demonstrate its application in the context of integrating upto four data streams to predict prostate cancer (CaP) aggressiveness pre- and post- radical prostatectomy (RP) using two datasets. Kaplan-Meier curves generated based on classifier prediction in the sMVCCA joint subspace showed significant (p < 0.05) differences for patients with and without BcR, unlike those generated from classifier prediction in the feature spaces of individual modalities.