Publicly Available Datasets

Data from: Identifying the morphologic basis for radiomic features in distinguishing different Gleason grades of prostate cancer on MRI: Preliminary Findings

Location: http://haeckel.case.edu/data/PLOS2018.zip 

Package Description: The 2 subfolders correspond to the data from each institution used in this study. Through processing, reconstruction, and co-registration, corresponding regions were mapped between radiology and pathology. Each folder contains complete feature information computed on a region-wise basis, in CSV format. All feature matrices are of the form [nRegions x nFeatures].

- PathFeats: Histomorphometric features (nFeatures = 1024)
- PathFeatNames: Description of each histomorphometric feature
- RadFeats: Radiomic features (nFeatures = 2379)
- RadFeatNames: Description of each radiomic feature
- GleasonScores: Gleason score for each region
- PatientID: Anonymized Patient ID associated with each region.

Note that one PatientID could be associated with multiple regions, and thus:
- D1: 23 patients have 65 regions
- D2: 13 patients have 40 regions

Please cite the original paper: "Identifying the morphologic basis for radiomic features in distinguishing different Gleason grades of prostate cancer on MRI: Preliminary Findings"

Nuclei detection

Location: http://haeckel.case.edu/data/TMI2015.tgz

Package Description: 

(1)code:
Some matlab functions are provided to read the training data and annotation to the testing data. The matlab function was provided for reading the coordinates of the four corners of the square ROI (left_up, right_up, left_down, right_dpwn) as well as the annotated coordinates of nuclear centers within the ROI in each image.
The matlab functions are:
Main_function.m and block_dots.m which are used to obtain the coordinates of the ROI and nuclei in each histological image. Here (1)block means the coordinates of four corners of the ROI (left_up, right_up, left_down, right_dpwn), (2)dots means the coordinates of all nuclear centers within the ROI, which is a matrix of N*2, and N is the number of cells.

(2)training:
The training data includes 14421 nuclear and 28032 non-nuclear patches which are saved as a mat file: training.mat
The function load_training_data.m is used to load the training data.

(3)testing:
There are 516 testing data in this dataset.The size of each testing image is 2200*2200 pixels. In each image, a Region of Interest (ROI) of 400*400 pixels is chosen for validation.

Each study contains 3 files:
(1)*.tif:         the primary histological image;
(2)*_block.tif:   the square ROI from the primary histological image;
(3)*_cell.tif:    annotated nuclear centers within the square ROI;

Please cite: 
Jun Xu, Lei Xiang, Qingshan Liu, Hannah Gilmore, Jianzhong Wu, Jinghai Tang, and Anant Madabhushi,"Stacked Sparse Autoencoder (SSAE) for Nuclei Detection on Breast Cancer Histopathology Images", IEEE Trans. on Medical Imaging, 2015.

If there is any question on the dataset, please send email to Dr. Jun Xu: xujung@gmail.com