This paper applies the use of convolutional neural networks (CNNs) in Fourier ptychographic microscopy to video capture.
Deep learning approaches to FPM, which computationally enhances the resolution of an image while maintaining a large field of view, have developed to replace the iterative model-based design. The neural net is preferable because once it has defined the relationship between low resolution inputs and the corresponding high resolution FPM output, it can transform input into output directly. The model based approach, in contrast, must iteratively improve its approximation of the high resolution object — a much slower process.
Nguyen et al. train a conditional generative adversarial network (cGAN) to determine the relationship between inputs and FPM output. One sub-network generates predicted FPM output, and the other sub-network attempts to decide whether the output is the ground-truth image or a prediction.
To apply this technique to video reconstruction, Nguyen et al. first assume that the samples on which they train display a sufficient quantity of cells that, within a single video frame, all cell states are represented somewhere. They then choose to train their CNN only the first frame of each video.
When the CNN is applied, frame-by-frame, to the remainder of the video, it is able to reconstruct each cell as it moves through many different states. This is because the original training frame exposed the CNN to all the possible conditions of a cell.
They treat each video frame as temporally independent of preceding frames. Because they know in what way cells of each state are meant to be reconstructed (by training on the many states present in frame 0), they simply reconstruct each state where it appears.
Using transfer learning, they are able to quickly train their model on new types of cells.
This technique for FPM video capture is a faster and more useful strategy than performing iterative FPM on each frame of a video. It has limitations, however. For a sample of cells too sparse or too large to all possible states within a single image, the strategy of only training on the first image of the video is ineffective. For fast moving samples, it will not be possible to collect a full low resolution input dataset before the cells change position.