Out-Of-Sample Extrapolation Using Semi-Supervised Manifold Learning (OSE-SSL): Content-Based Image Retrieval For Prostate Histology Grading

TitleOut-Of-Sample Extrapolation Using Semi-Supervised Manifold Learning (OSE-SSL): Content-Based Image Retrieval For Prostate Histology Grading
Publication TypeConference Paper
Year of Publication2011
AuthorsSparks, R, Madabhushi A
Conference NameIEEE International Symposium on Biomedical Imaging (ISBI)
Date Published2011 Mar 30

In this paper, we present an out-of-sample extrapolation (OSE) scheme in the context of semi-supervised manifold learning (OSESSL). Manifold learning (ML) takes samples with high dimensionality and learns a set of low dimensional embeddings. Embeddings generated by ML preserve nonlinear relationships between samples allowing dataset visualization, classification, or evaluation of object similarity. Semi-supervised ML (SSL), a recent ML extension, exploits known class labels to learn embeddings, which may result in greater separation between samples of different classes compared to unsupervised ML schemes. Most ML schemes utilize the eigenvalue decomposition (EVD) to learn embeddings. For instance, Graph Embedding (GE) learns embeddings by EVD on a similarity matrix that models high dimensional feature vector similarity between samples. In datasets where new samples are acquired, such as a content-based image retrieval (CBIR) system, recalculating EVD is infeasible. OSE schemes obtain new embeddings without recalculating EVD. The Nystr¨om method (NM) is an OSE algorithm where new embeddings are estimated as a weighted sum of known embeddings. Known embeddings must describe the embedding space for NM to accurately estimate new embeddings. In this paper, NM and semi-supervised GE (SSGE) are combined to learn embeddings which cluster samples by class and rapidly calculate embeddings for new samples without recalculating EVD. OSE-SSL is compared to (i) NM paired with GE (NM-GE), and (ii) SSGE obtained for the full database, where SSGE results represent ground truth embeddings. OSE-SSL, NM-GE, and SSGE are evaluated in their ability to: (1) cluster samples by label, measured by Silhouette Index (SI); (2) CBIR accuracy, measured by area under the precision-recall curve (AUPRC). In a synthetic Swiss roll dataset of 2000 samples, OSE-SSL requires training on 50% of the dataset to achieve SI and AUPRC similar to SSGE while NM-GE requires 70% of dataset to achieve SI and AUPRC similar to GE. For a prostate histology dataset of 888 glands, a CBIR system was evaluated on its ability to retrieve images according to Gleason Grade. OSE-SSL had AUPRC of 0.6 while NM-GE had AUPRC of 0.3.

PDF Link


 *IEEE COPYRIGHT NOTICE: 1997 IEEE. * Personal use of this material is permitted. However, permission to reprint/ republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

*COPYRIGHT NOTICE:* These materials are presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.