 |
InfinityGAN: Towards Infinite-Resolution Image Synthesis
Chieh Hubert Lin, Hsin-Ying Lee, Yen-Chi Cheng, Sergey Tulyakov, Ming-Hsuan Yang
ArXiv 2021
[abs]
[paper]
[project page]
[codes (TBA)]
We present InfinityGAN, a method to generate arbitrary-resolution images. The problem is associated with several key challenges.
First, scaling existing models to a high resolution is resource-constrained, both in terms of computation and availability of
high-resolution training data. Infinity-GAN trains and infers patch-by-patch seamlessly with low computational resources.
Second, large images should be locally and globally consistent, avoid repetitive patterns, and look realistic.
To address these, InfinityGAN takes global appearance, local structure and texture into account.With this formulation,
we can generate images with resolution and level of detail not attainable before. Experimental evaluation supports that
InfinityGAN generates imageswith superior global structure compared to baselines at the same time featuring parallelizable inference.
Finally, we how several applications unlocked by our approach, such as fusing styles spatially, multi-modal outpainting and image
inbetweening at arbitrary input and output resolutions
|
 |
In&Out : Diverse Image Outpainting via GAN Inversion
Yen-Chi Cheng, Chieh Hubert Lin, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Ming-Hsuan Yang
ArXiv 2021
[abs]
[paper]
[project page]
[codes (TBA)]
Image outpainting seeks for a semantically consistent extension of the input image beyond its available content.
Compared to inpainting -- filling in missing pixels in a way coherent with the neighboring pixels -- outpainting
can be achieved in more diverse ways since the problem is less constrained by the surrounding pixels. Existing
image outpainting methods pose the problem as a conditional image-to-image translation task, often generating
repetitive structures and textures by replicating the content available in the input image. In this work, we formulate
the problem from the perspective of inverting generative adversarial networks. Our generator renders micro-patches
conditioned on their joint latent code as well as their individual positions in the image. To outpaint an image, we
seek for multiple latent codes not only recovering available patches but also synthesizing diverse outpainting by
patch-based generation. This leads to richer structure and content in the outpainted regions. Furthermore, our
formulation allows for outpainting conditioned on the categorical input, thereby enabling flexible user controls.
Extensive experimental results demonstrate the proposed method performs favorably against existing in- and outpainting
methods, featuring higher visual quality and diversity.
|
Pre-Ph.D. |
 |
InstaNAS: Instance-aware Neural Architecture Search
An-Chieh Cheng*, Chieh Hubert Lin*, Da-Cheng Juan, Wei Wei, Min Sun
AAAI 2020
[abs]
[paper]
[project page (w/ demo)]
[codes]
Conventional Neural Architecture Search (NAS) aims at finding a single architecture that
achieves the best performance, which usually optimizes task related learning objectives such
as accuracy. However, a single architecture may not be representative enough for the whole
dataset with high diversity and variety. Intuitively, electing domain-expert architectures
that are proficient in domain-specific features can further benefit architecture related
objectives such as latency. In this paper, we propose InstaNAS---an instance-aware NAS
framework---that employs a controller trained to search for a "distribution of architectures"
instead of a single architecture; This allows the model to use sophisticated architectures
for the difficult samples, which usually comes with large architecture related cost, and
shallow architectures for those easy samples. During the inference phase, the controller
assigns each of the unseen input samples with a domain expert architecture that can achieve
high accuracy with customized inference costs. Experiments within a search space inspired by
MobileNetV2 show InstaNAS can achieve up to 48.8% latency reduction without compromising
accuracy on a series of datasets against MobileNetV2.
|
 |
COCO-GAN: Generation by Parts via Conditional Coordinating
Chieh Hubert Lin, Chia-Che Chang, Yu-Sheng Chen, Da-Cheng Juan, Wei Wei, Hwann-Tzong Chen
ICCV 2019 (oral)
[abs]
[paper (low resolution)]
[paper (high resolution)]
[project page]
[codes]
Humans can only interact with part of the surrounding environment due to biological restrictions.
Therefore, we learn to reason the spatial relationships across a series of observations to piece
together the surrounding environment. Inspired by such behavior and the fact that machines also have
computational constraints, we propose COnditional COordinate GAN (COCO-GAN)
of which the generator generates images by parts based on their spatial coordinates as the condition.
On the other hand, the discriminator learns to justify realism across multiple assembled patches by
global coherence, local appearance, and edge-crossing continuity. Despite the full images are never
generated during training, we show that COCO-GAN can produce \textbf{state-of-the-art-quality}
full images during inference. We further demonstrate a variety of novel applications enabled by
teaching the network to be aware of coordinates. First, we perform extrapolation to the learned
coordinate manifold and generate off-the-boundary patches. Combining with the originally generated
full image, COCO-GAN can produce images that are larger than training samples, which we called
"beyond-boundary generation". We then showcase panorama generation within a cylindrical coordinate
system that inherently preserves horizontally cyclic topology. On the computation side, COCO-GAN
has a built-in divide-and-conquer paradigm that reduces memory requisition during training and
inference, provides high-parallelism, and can generate parts of images on-demand.
|
|
Point-to-Point Video Generation
Tsun-Hsuan Wang*, Yen-Chi Cheng*, Chieh Hubert Lin, Hwann-Tzong Chen, Min Sun
ICCV 2019
[abs]
[paper]
[project page]
[codes]
While image manipulation achieves tremendous breakthroughs (e.g., generating realistic faces) in
recent years, video generation is much less explored and harder to control, which limits its applications
in the real world. For instance, video editing requires temporal coherence across multiple clips and
thus poses both start and end constraints within a video sequence. We introduce point-to-point video
generation that controls the generation process with two control points: the targeted start- and end-frames.
The task is challenging since the model not only generates a smooth transition of frames, but also
plans ahead to ensure that the generated end-frame conforms to the targeted end-frame for videos of
various length. We propose to maximize the modified variational lower bound of conditional data
likelihood under a skip-frame training strategy. Our model can generate sequences such that their
end-frame is consistent with the targeted end-frame without loss of quality and diversity. Extensive
experiments are conducted on Stochastic Moving MNIST, Weizmann Human Action, and Human3.6M to evaluate
the effectiveness of the proposed method. We demonstrate our method under a series of scenarios
(e.g., dynamic length generation) and the qualitative results showcase the potential and merits of
point-to-point generation.
|
 |
3D LiDAR and Stereo Fusion using Stereo Matching Network with Conditional Cost Volume Normalization
Tsun-Hsuan Wang, Hou-Ning Hu, Chieh Hubert Lin, Yi-Hsuan Tsai, Wei-Chen Chiu, Min Sun
IROS 2019
[abs]
[paper]
[project page]
[codes]
The complementary characteristics of active and passive depth sensing techniques motivate the
fusion of the Li-DAR sensor and stereo camera for improved depth perception. Instead of directly
fusing estimated depths across LiDAR and stereo modalities, we take advantages of the stereo
matching network with two enhanced techniques: Input Fusion and Conditional Cost Volume
Normalization (CCVNorm) on the LiDAR information. The proposed framework is generic and closely
integrated with the cost volume component that is commonly utilized in stereo matching neural networks.
We experimentally verify the efficacy and robustness of our method on the KITTI Stereo and Depth
Completion datasets, obtaining favorable performance against various fusion strategies. Moreover,
we demonstrate that, with a hierarchical extension of CCVNorm, the proposed method brings only slight
overhead to the stereo matching network in terms of computation time and model size.
|
 |
Escaping from Collapsing Modes in a Constrained Space
Chia-Che Chang*, Chieh Hubert Lin*, Che-Rung Lee, Da-Cheng Juan, Wei Wei, Hwann-Tzong Chen
ECCV 2018
[abs]
[paper]
[codes]
Generative adversarial networks (GANs) often suffer from unpredictable mode-collapsing during training.
We study the issue of mode collapse of Boundary Equilibrium Generative Adversarial Network (BEGAN),
which is one of the state-of-the-art generative models. Despite its potential of generating high-quality images,
we find that BEGAN tends to collapse at some modes after a period of training. We propose a new model,
called BEGAN with a Constrained Space (BEGAN-CS), which includes a latent-space constraint in the
loss function. We show that BEGAN-CS can significantly improve training stability and suppress mode collapse
without either increasing the model complexity or degrading the image quality. Further, we visualize the distribution
of latent vectors to elucidate the effect of latent-space constraint. The experimental results show that our method has
additional advantages of being able to train on small datasets and to generate images similar to a given real
image yet with variations of designated attributes on-the-fly.
|