BallGAN: 3D-aware Image Synthesis with a Spherical Background

Minjung Shin1,2, Yunji Seo1, Jeongmin Bae1, Youngsun Choi1, Hyunsu Kim2 Hyeran Byun1 Youngjung Uh1
1Yonsei University, 2NAVER AI Lab,

ICCV 2023

BallGAN is a 3D-aware GAN framework that uses a 2D opaque sphere to approximate the background and conventional 3D features for the foreground. Our method generates highly-detailed foreground and background separately, without requiring post-processing. The generation of high-quality foreground objects is essential for effective 3D content creation.

Abstract

3D-aware GANs aim to synthesize realistic 3D scenes that can be rendered in arbitrary camera viewpoints, generating high-quality images with well-defined geometry. As 3D content creation becomes more popular, the ability to generate foreground objects separately from the background has become a crucial property. Existing methods have been developed regarding overall image quality, but they can not generate foreground objects only and often show degraded 3D geometry.

In this work, we propose to represent the background as a spherical surface for multiple reasons inspired by computer graphics. Our method naturally provides foreground-only 3D synthesis facilitating easier 3D content creation. Furthermore, it improves the foreground geometry of 3D-aware GANs and the training stability on datasets with complex backgrounds.

Proposed method

Motivation

BallGAN is inspired by a popular approach for video games or movies in the graphics community; representing salient objects with detailed 3D models and approximating peripheral scenery with simple surfaces. It drastically reduces the complexity of scenes by devoting less resources to the background. As human vision system focus more on salient objects than the background, such technique does not degrade the user’s experience.

BallGAN generator overview

BallGAN represents a 3D scene as a union of a usual foreground and a background on a spherical surface according to our intuition: approximating the unbounded background as a thin surface reduces the unnecessary degree of freedom of the scene. Generator consists of two backbone networks for foreground and background. Representations from these networks are rendered by our modified volume rendering equation to synthesize images.

Comparison with Baselines

The generation results(b) according to the 3D modeling method(a) of each model. Previous methods exhibit degenerate solutions; although the rendered images are realistic, the image synthesis process conceal the degenerate solutions for the underlying 3D geometries, e.g., broken shapes and faces attached to walls. To address these degenerate solutions, we separate foreground and background in 3D and approximate the background to the surface of the sphere, allowing more focus on foreground details.

Renderings and marching cubes of the same samples. Although all methods similarly reconstruct the target real image by inversion, the underlying 3D geometries in the baselines are not realistic. On the other hand, BallGAN produces realistic shapes including hair boundary.

Application

Compositing foreground in different viewpoints on arbitrary backgrounds

By applying PTI to a given single image on BallGAN, the foreground and background can be separately projected to their corresponding latent space. Thanks to BallGAN's detailed foreground separation ability and sufficient understanding of 3D, the original foreground can be composited into different backgrounds and viewed from various angles while preserving its detailed features.

Real image
Foreground inversion image
Alpha-mask of foreground
3D geometry of foreground
Novel view synthesis with different backgrounds

Foreground editing using CLIP

A given single RGB image is inverted using PTI (pivotal tuning inversion) and then manipulated through CLIP-guided optimization based on the given text description. With the help of BallGAN's ability to capture detailed foreground features in 3D, it is possible to modify the foreground image and 3D geometry as desired, including curly hair. The edited object can be composited into various views.

Real image
Foreground inversion image
Target description : "Culry hair"
Novel view synthesis

Generating foreground objects

Randomly generated images with their corresponding foreground image and alpha masks. The foreground alpha-mask is computed from the background transmittance calculated during the volume rendering step. BallGAN's effective separation of foreground and background can be observed through alpha masks that capture detailed features such as thin hair strands, cat whiskers, and fur.

First row : fully rendering (foreground+background) | Second row : foreground rendering | Third row : foreground alpha-mask

FFHQ

AFHQv2-Cat

3D-aware Synthesis

Randomly generated images and corresponding foreground 3D geometry, extracted using marching cube. The 3D geometry includes sufficient details, such as wave hairstyles, beards, or the cat's fur and whiskers.

FFHQ---

AFHQv2-Cat

Background analysis

Effectiveness of sphere background

Our approach utilizes a sphere surface as a background representation, which simplifies background and enhances the focus on foreground. Above figure shows a single scene overfitting scenario compared to NeRF++. In this experiment, our method only differs from NeRF++ in its choice of background representation, resulting in superior separation of foreground and background, and more accurate depth estimation for foreground objects.

Why sphere?

While other simplified representations such as a plane or cube can be a candidate for background representation, it is important that the chosen representation covers all possible camera viewpoints and can be trained stably. A planar background is insufficient as it does not cover the background when the camera rotates beyond 90 degrees. A cube background can have sudden changes in gradient or inconsistencies in distance, as shown in figure (a) with the red circle, leading to unstable training and convergence issues. Therefore, a sphere background is a suitable and preferred choice for background representation.

BibTeX

@article{shin2023ballgan,
      title={BallGAN: 3D-aware Image Synthesis with a Spherical Background},
      author={Shin, Minjung and Seo, Yunji and Bae, Jeongmin and Choi, Young Sun and Kim, Hyunsu and Byun, Hyeran and Uh, Youngjung},
      journal={arXiv preprint arXiv:2301.09091},
      year={2023}
    }