Deep Residual Learning for Image Recognition
Abstract 1
Deeper neural networks are more difficult to train. We
present a residual learning framework to ease the training
of networks that are substantially deeper than those used
previously. We explicitly reformulate the layers as learn-
ing residual functions with reference to the layer inputs, in-
stead of learning unreferenced functions. We provide com-
prehensive empirical evidence showing that these residual
networks are easier to optimize, and can gain accuracy from
considerably increased depth. On the ImageNet dataset we
evaluate residual nets with a depth of up to 152 layers—8
deeper than VGG nets [41] but still having lower complex-
ity. An ensemble of these residual nets achieves 3.57% error
on the ImageNet testset. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis
on CIFAR-10 with 100 and 1000 layers.The depth of representations is of central importancefor many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deepresidual nets are foundations of our submissions to ILSVRC& COCO 2015 competitions where we also won the 1stplaces on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.