InfinityGAN: Towards Infinite-Pixel Image Synthesis

³CMU

⁴Google Research

⁵Yonsei University

Chieh Hubert Lin¹

Hsin-Ying Lee²

Yen-Chi Cheng³

Sergey Tulyakov²

Ming-Hsuan Yang^1,4

¹UC Merced

²Snap Research

³CMU

⁴Google Research

Paper (ICLR 2022) | Codes (PyTorch)

Abstract

We present a novel framework, InfinityGAN, for arbitrary-sized image generation. The task is associated with several key challenges. First, scaling existing models to an arbitrarily large image size is resource-constrained, in terms of both computation and availability of large-field-of-view training data. InfinityGAN trains and infers in a seamless patch-by-patch manner with low computational resources. Second, large images should be locally and globally consistent, avoid repetitive patterns, and look realistic. To address these, InfinityGAN disentangles global appearances, local structures, and textures. With this formulation, we can generate images with spatial size and level of details not attainable before. Experimental evaluation validates that InfinityGAN generates images with superior realism compared to baselines and features parallelizable inference. Finally, we show several applications unlocked by our approach, such as spatial style fusion, multi-modal outpainting, and image inbetweening. All applications can be operated with arbitrary input and output sizes.

[Paper]

ICLR 2022

[Codes]

PyTorch

[Citation]


					@inproceedings{lin2021infinity, 

					   title={Infinity{GAN}: Towards Infinite-Pixel Image Synthesis}, 

					   author={Lin, Chieh Hubert and Cheng, Yen-Chi and Lee, Hsin-Ying and Tulyakov, Sergey and Yang, Ming-Hsuan}, 

					   booktitle={International Conference on Learning Representations}, 

					   year={2022}, 

					   url={https://openreview.net/forum?id=ufGMqIM0a4b} 

					}

Overview of the Method

The generator of InfinityGAN consists of two modules, a structure synthesizer based on a neural implicit function, and a fully-convolutional texture synthesizer with all positional information removed (see Figure 3 in the paper). The two networks take four sets of inputs, a global latent variable that defines the holistic appearance of the image, a local latent variable that represents the local and structural variation, a continuous coordinate for learning the neural implicit structure synthesizer, and a set of randomized noises to model fine-grained texture. InfinityGAN synthesizes images of arbitrary sizes by learning spatially extensible representations.

Qualitative Results

Generation at extended sizes. All samples are generated at 1024×1024 pixels with an InfinityGAN trained at 101×101 pixels.

Generation at a higher-resolution setting. We synthesize 4096×4096 pixels images with a model trained at 397×397 pixels. Note that images here are actually 1024×1024 pixels due to file size limit, please click on the image to access the raw image.

Very-long landscape generation. We provide a 256×10240 pixels sample synthesized with InfinityGAN. The sample shows that (a) our InfinityGAN can generalize to arbitrarily-large image sizes, and (b) the synthesized contents do not self-repeat while using the sample global latent variable.

Generation diversity. We show that the structure synthesizer and texture synthesizer separately models structure and texture by changing either the local latent or textural latent while all other variables are fixed. The results also show that InfinityGAN can synthesize a diverse set of landscape structures at the same coordinate.

Comparison with related methods. We show that InfinityGAN can produce more favorable holistic appearances against related methods while testing with an extended size 1024×1024. (NCI: Non-Constant Input, PFG: Padding-Free Generator).

Applications: Spatial Style Fusion

Style A

Style B

Style C

Style D

Spatial style fusion. We present a mechanism in fusing multiple styles together to increase the interestingness and interactiveness of the generation results. The 512×4096 image fuses four styles across 258 independently generated patches.

Applications: Image Outpainting

Our outpainting results. We outpaint real images (red box, 256×128 pixels) into 256×768 pixels panorama (equivalent to 5× outpainting).

Outpainting long-range area. InfinityGAN synthesizes continuous and more plausible outpainting results for arbitrarily large outpainting areas. Different from previous methods, InfinityGAN does not need to iteratively outpaint the results. The real image annotated with red box is 256×128 pixels.

Multi-modal outpainting. InfinityGAN can natively achieve multi-modal outpainting by sampling different local latents in the outpainted region. The real image annotated with red box is 256×128 pixels.

Outpainting quantitative performance. The combination of In&Out and InfinityGAN achieves state-of-the-art FID (lower better) performance on image outpainting task.

Applications: Image Inbetweening

Image inbetweening with inverted latents. We show that our InfinityGAN can synthesize arbitrary-length cyclic panorama and inbetweened images by inverting a real image at different position. The top-row image size is 256×2080 pixels.

More inbetweening results. We inbetween two real images (red and orange boxes, both 256×128 pixels) into a 256×1280 pixels panorama.

More cyclic inbetweening results. We synthesize cyclic panorama at 256×1280 pixels by inbetweening the same image (red boxes, 256×128 pixels) on two sides.

Applications: Parallel Batching

Inference speed up with parallel batching. Benefit from the spatial independent generation nature, InfinityGAN achieves up to 7.20 inference speed up by with parallel batching. We conduct all experiments at a batch size of 1, and OOM indicates out-of-memory. Note that the GPU time here accounts pure GPU execution time and (if applicable) data parallel scatter-aggregation time.

Acknowledgement

We sincerely thank the great power from OuO.

References

[1]

COCO-GAN

Chieh Hubert Lin, Chia-Che Chang, Yu-Sheng Chen, Da-Cheng Juan, Wei Wei, and Hwann-Tzong Chen. "Coco-gan: Generation by parts via conditional coordinating." In ICCV, 2019.

[2]

SinGAN

Tamar Rott Shaham, Tali Dekel, and Tomer Michaeli. "Singan: Learning a generative model from a single natural image." In ICCV, 2019.

[3]

StyleGAN2

Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. "Analyzing and improving the image quality of stylegan." In CVPR, 2020.

[4]

In&Out

Yen-Chi Cheng, Chieh Hubert Lin, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Ming-Hsuan Yang. "In&Out : Diverse Image Outpainting via GAN Inversion." arXiv preprint, 2021.

[5]

Boundless

Piotr Teterwak, Aaron Sarna, Dilip Krishnan, Aaron Maschinot, David Belanger, Ce Liu, and William T Freeman. "Boundless: Generative adversarial networks for image extension." In ICCV, 2019.

[6]

NS-Outpaint

Zongxin Yang, Jian Dong, Ping Liu, Yi Yang, and Shuicheng Yan. "Very long natural scenery image prediction by outpainting." In ICCV, 2019.