InfinityGAN: Towards Infinite-Resolution Image Synthesis


Abstract

We present InfinityGAN, a method to generate arbitrary-resolution images. The problem is associated with several key challenges. First, scaling existing models to a high resolution is resource-constrained, both in terms of computation and availability of high-resolution training data. InfinityGAN trains and infers patch-by-patch seamlessly with low computational resources. Second, large images should be locally and globally consistent, avoid repetitive patterns, and look realistic. To address these, InfinityGAN takes global appearance, local structure and texture into account. With this formulation, we can generate images with resolution and level of detail not attainable before. Experimental evaluation supports that InfinityGAN generates images with superior global structure compared to baselines at the same time featuring parallelizable inference. Finally, we show several applications unlocked by our approach, such as fusing styles spatially, multi-modal outpainting and image inbetweening at arbitrary input and output resolutions.

[Paper]

Main paper   |   Supplementary

[Codes]

PyTorch (TBA)

[Citation]

@article{lin2021infinity,
   title={InfinityGAN: Towards Infinite-Resolution Image Synthesis},
   author={Lin, Chieh Hubert and Cheng, Yen-Chi and Lee, Hsin-Ying and Tulyakov, Sergey and Yang, Ming-Hsuan},
   journal={arXiv preprint arXiv:2104.03963},
   year={2021}
}

Overview of the Method


The generator of InfinityGAN consists of two modules, a structure synthesizer based on a neural implicit function, and a fully-convolutional texture synthesizer with all positional information removed (see Figure 3 in the paper). The two networks take four sets of inputs, a global latent variable that defines the holistic appearance of the image, a local latent variable that represents the local and structural variation, a continuous coordinate for learning the neural implicit structure synthesizer, and a set of randomized noises to model fine-grained texture. InfinityGAN synthesizes images of arbitrary resolution by learning spatially extensible representations.


Qualitative Results

Generation at extended resolution.   All samples are generated at 1024×1024 resolution with an InfinityGAN trained at 101×101 resolution.

Very-high-resolution generation.   We provide a 256×10240 resolution sample synthesized with InfinityGAN. The sample shows that (a) our InfinityGAN can generalize to arbitrarily-high resolution, and (b) the synthesized contents do not self-repeat while using the sample global latent variable.


Generation diversity.   We show that the structure synthesizer and texture synthesizer separately models structure and texture by changing either the local latent or textural latent while all other variables are fixed. The results also show that InfinityGAN can synthesize a diverse set of landscape structures at the same coordinate.

Comparison with related methods.   We show that InfinityGAN can produce more favorable holistic appearances against related methods while testing with an extended resolution 1024×1024. (NCI: Non-Constant Input, FCG: Fully-Convolutional Generator).


Applications: Spatial Style Fusion

Style A

Style B

Style C

Style D

Spatial style fusion.   We present a mechanism in fusing multiple styles together to increase the interestingness and interactiveness of the generation results. The 512×4096 image fuses four styles across 258 independently generated patches.


Applications: Image Outpainting

Our outpainting results.   We outpaint real images (red box, 256×128 pixels) into 256×768 pixels panorama (equivalent to 5× outpainting).

Outpainting long-range area.   InfinityGAN synthesizes continuous and more plausible outpainting results for arbitrarily large outpainting areas. Different from previous methods, InfinityGAN does not need to iteratively outpaint the results. The real image annotated with red box is 256×128 pixels.

Multi-modal outpainting.   InfinityGAN can natively achieve multi-modal outpainting by sampling different local latents in the outpainted region. The real image annotated with red box is 256×128 pixels.

Outpainting quantitative performance.   The combination of In&Out and InfinityGAN achieves state-of-the-art FID (lower better) performance on image outpainting task.


Applications: Image Inbetweening

Image inbetweening with inverted latents.   We show that our InfinityGAN can synthesize arbitrary-length cyclic panorama and inbetweened images by inverting a real image at different position. The top-row image size is 256×2080 pixels.

More inbetweening results.   We inbetween two real images (red and orange boxes, both 256×128 pixels) into a 256×1280 pixels panorama.

More cyclic inbetweening results.   We synthesize cyclic panorama at 256×1280 resolution by inbetweening the same image (red boxes, 256×128 pixels) on two sides.


Applications: Parallel Batching

Inference speed up with parallel batching.   Benefit from the spatial independent generation nature, InfinityGAN achieves up to 7.20 inference speed up by with parallel batching. We conduct all experiments at a batch size of 1, and OOM indicates out-of-memory. Note that the GPU time here accounts pure GPU execution time and (if applicable) data parallel scatter-aggregation time.


Acknowledgement

We sincerely thank the great power from OuO.


References

[1]

COCO-GAN

Chieh Hubert Lin, Chia-Che Chang, Yu-Sheng Chen, Da-Cheng Juan, Wei Wei, and Hwann-Tzong Chen. "Coco-gan: Generation by parts via conditional coordinating." In ICCV, 2019.

[2]

SinGAN

Tamar Rott Shaham, Tali Dekel, and Tomer Michaeli. "Singan: Learning a generative model from a single natural image." In ICCV, 2019.

[3]

StyleGAN2

Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. "Analyzing and improving the image quality of stylegan." In CVPR, 2020.

[4]

In&Out

Yen-Chi Cheng, Chieh Hubert Lin, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Ming-Hsuan Yang. "In&Out : Diverse Image Outpainting via GAN Inversion." arXiv preprint, 2021.

[5]

Boundless

Piotr Teterwak, Aaron Sarna, Dilip Krishnan, Aaron Maschinot, David Belanger, Ce Liu, and William T Freeman. "Boundless: Generative adversarial networks for image extension." In ICCV, 2019.

[6]

NS-Outpaint

Zongxin Yang, Jian Dong, Ping Liu, Yi Yang, and Shuicheng Yan. "Very long natural scenery image prediction by outpainting." In ICCV, 2019.