Feature Importance in Nonlinear Embeddings (FINE): Applications in Digital Pathology.

TitleFeature Importance in Nonlinear Embeddings (FINE): Applications in Digital Pathology.
Publication TypeJournal Article
Year of Publication2015
AuthorsGinsburg, S, Lee G, Ali S, Madabhushi A
JournalIEEE transactions on medical imaging
Date Published07/2015

Quantitative histomorphometry (QH) refers to the process of computationally modeling disease appearance on digital pathology images. This procedure typically involves extraction of hundreds of features, which may be used to predict disease presence, aggressiveness, or outcome, from digitized images of tissue slides. Due to the "curse of dimensionality", constructing a robust and interpretable classifier is very challenging when the dimensionality of the feature space is high. Dimensionality reduction (DR) is one approach for reducing the dimensionality of the feature space to facilitate classifier construction. When DR is performed, however, it can be challenging to quantify the contribution of each of the original features to the final classification or prediction result. In QH it is often important not only to create an accurate classifier of disease presence and aggressiveness, but also to identify the features that contribute most substantially to class separability. This feature transparency is often a pre- requisite for adoption of clinical decision support classification tools since physicians are typically resistant to opaque "black box" prediction models. We have previously presented a method for scoring features based on their importance for classification on an embedding derived via principal components analysis (PCA). However, nonlinear DR (NLDR), which is more useful for many biomedical problems, involves the eigen-decomposition of a kernel matrix rather than the data itself, compounding the issue of classifier interpretability. In this paper we extend our PCA-based feature scoring method to kernel PCA (KPCA). We demonstrate that our KPCA approach for evaluating feature importance in nonlinear embeddings (FINE) applies to several popular NLDR algorithms that can be cast as variants of KPCA, such as Isomap and Laplacian eigenmaps. FINE is applied to four digital pathology datasets with 53-2343 features to identify key QH features describing nuclear or glandular arrangements for predicting the risk of recurrence of breast and prostate cancers. Measures of nuclear and glandular architecture and clusteredness were found to play an important role in predicting the likelihood of recurrence of both breast and prostate cancers. Additionally, FINE was able to identify a stable set of features that provide good classification accuracy on four publicly available datasets from the NIPS 2003 Feature Selection Challenge. Compared to the t-test, Fisher score, and Gini index, FINE was found to yield more stable feature subsets that achieve higher classification accuracy for most of the datasets considered.

PDF Link


Alternate JournalIEEE Trans Med Imaging

 *IEEE COPYRIGHT NOTICE: 1997 IEEE. * Personal use of this material is permitted. However, permission to reprint/ republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

*COPYRIGHT NOTICE:* These materials are presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.