Publically Available Data



Entire Package

This package includes three folders:

Some matlab functions are provided to read the training data and annotation to the testing data;

This is the full training dataset.

This is the full testing dataset.

The full training and testing datasets used in Jun Xu et al's paper.

Jun Xu, Lei Xiang, Qingshan Liu, Hannah Gilmore, Jianzhong Wu, Jinghai Tang, and Anant Madabhushi,"Stacked Sparse Autoencoder (SSAE) for Nuclei Detection on Breast Cancer Histopathology Images", IEEE Trans. on Medical Imaging, 2015.

If there is any question on the dataset, please send email to Dr. Jun Xu:

Training Data
The training data includes 14421 nuclear and 28032 non-nuclear patches which are saved as a mat file: training.mat
The function load_training_data.m is used to load the training data.

Testing Data
There are 516 testing data in this dataset.The size of each testing image is 2200*2200 pixels. In each image, a Region of Interest (ROI) of 400*400 pixels is chosen for validation.

Each study contains 3 files:
(1)*.tif:         the primary histological image;
(2)*_block.tif:   the square ROI from the primary histological image;
(3)*_cell.tif:    annotated nuclear centers within the square ROI;

Matlab functions
The matlab function was provided for reading the coordinates of the four corners of the square ROI (left_up, right_up, left_down, right_dpwn) as well as the annotated coordinates of nuclear centers within the ROI in each image.
The matlab functions are:
Main_function.m and block_dots.m which are used to obtain the coordinates of the ROI and nuclei in each histological image. Here (1)block means the coordinates of four corners of the ROI (left_up, right_up, left_down, right_dpwn), (2)dots means the coordinates of all nuclear centers within the ROI, which is a matrix of N*2, and N is the number of cells.