Abstract

High throughput microscopy of many single cells generates high-dimensional data that are far from straightforward to analyze. One important problem is automatically detecting the cellular compartment where a fluorescently tagged protein resides, a task relatively simple for an experienced human, but difficult to automate on a computer. Here, we train an 11-layer neural network on data from mapping thousands of yeast proteins, achieving per cell localization classification accuracy of 91%, and per protein accuracy of 99% on held out images. We confirm that low-level network features correspond to basic image characteristics, while deeper layers separate localization classes. Using this network as a feature calculator, we train standard classifiers that assign proteins to previously unseen compartments after observing only a small number of training examples. Our results are the most accurate subcellular localization classifications to date, and demonstrate the usefulness of deep learning for high throughput microscopy.

A deep neural network for protein subcellular classification.


Data

We constructed a large-scale labeled dataset based on high-throughput proteomescale microscopy images from Chong et al. Each image has two channels: a red fluorescent protein (mCherry) with cytosolic localization, thus marking the cell contour, and green fluorescent protein (GFP), tagging an endogenous gene in the 3’ end, and characterizing the abundance and localization of the protein. The data are split into 65,000 examples for training, 12,500 for validation and 12,500 for testing.

Image examples.

Reference: Chong, Y.T., Koh, J.L., Friesen, H., Duffy, S.K., Cox, M.J., Moses, A., Moffat, J., Boone, C. and Andrews, B.J., 2015. Yeast proteome dynamics from single cell imaging and automated analysis. Cell, 161(6), pp.1413-1424. PubMed


Code

We trained a deep convolutional neural network that has 11 layers (8 convolutional and 3 fully connected) with learnable weights.

A deep neural network for protein subcellular classification.


Additional information

We explored many features of the data, presented in both the main text of the paper, main figures, as well as additional analyses.

The code for generating reports, as well as a subset of figures is also available. Some path names require updating on a local machine, please contact us if you have problems doing so.

All the code is available under the MIT license.