Masked Autoencoders Are Scalable Vision Learners

Autoencoders are a useful tool extract strong latent features of data. This can be especially interesting for image tasks because images are very large but mostly contain very few interesting objects. This paper shows, that even with high input masking ratios, these models are still able to recreate the input images with high levels of detail.

Link to the paper: https://arxiv.org/abs/2111.06377