Notes from “EfficientNet: Improving Accuracy and Efficiency through AutoML and Model Scaling” by Mingxing Tan and Quoc V. Le

These notes are from the Google AI Blog post found here

Convolutional neural networks (CNNs) are usually created with a baseline model, then upgraded by scaling up certain dimensions of that model. This can include scaling up the width (adding more nodes to each layer), depth (adding more layers), or resolution (using a larger resolution of input images for training and evaluation), all for increased efficiency and accuracy of the CNN. Conventional methods scale each dimension independently and arbitrarily, and this requires a tedious manual tuning step in which training parameters are redefined to accommodate the new network size. The resulting upscaled CNN is often less accurate and less efficient than it might have been under more optimized scaling conditions.

In this paper, the authors propose a simple and effective scaling method that uniformly scales each dimension (depth, width, resolution of CNN) with a fixed set of scaling coefficients, depending on the computational capacity that can be afforded. A parameter sweep determines how to optimally scale depth, width, and input resolution to take advantage of available resources. These scaling coefficients are used to transform the base model. The resulting scaled model is called an EfficientNet. The models share a simple base image classification architecture that is scaled to generate different instances of the EfficientNet.

The EfficientNet models are capable of achieving state-of-the-art accuracy with many fewer parameters than competing networks. EfficientNet documentation and source code are available on GitHub (X).