Space-time view synthesis from videos of dynamic scenes – Google Research Blog

Posted by Zhengqi Li and Noah Snavely, Research Scientists, Google Research

A mobile phone’s camera is a powerful tool for capturing everyday moments. However, there are limitations when it comes to capturing dynamic scenes with just a single camera. To achieve certain effects like adjusting camera motion or timing in a recorded video, expensive Hollywood setups with synchronized camera rigs are typically required. But what if we could achieve similar effects using just a mobile phone’s camera and without the Hollywood budget?

In our paper, “DynIBaR: Neural Dynamic Image-Based Rendering,” which received an honorable mention at CVPR 2023, we introduce a new method that can generate photorealistic free-viewpoint renderings from a single video of a complex, dynamic scene. This method, called Neural Dynamic Image-Based Rendering (DynIBaR), allows for the generation of various video effects such as “bullet time” effects, video stabilization, depth of field, and slow motion, all from a single video captured with a phone’s camera. DynIBaR significantly advances video rendering of complex moving scenes, opening up new possibilities for video editing applications.

We have also made the code for DynIBaR available on the project page, so you can try it out for yourself. With DynIBaR, you can freeze time while the camera continues to move freely through the scene in an in-the-wild video of a complex, dynamic scene.

Background:
In recent years, there has been great progress in computer vision techniques using neural radiance fields (NeRFs) to reconstruct and render static 3D scenes. However, most videos captured on mobile devices involve moving objects, which presents a more challenging 4D scene reconstruction problem that cannot be solved using standard view synthesis methods. Existing methods for view synthesis in dynamic scenes produce blurry and inaccurate renderings.

The key limitation of these methods is that they store the entire moving scene in a single data structure, which leads to computational intractability and blurry renderings. DynIBaR overcomes this limitation by adopting a different rendering paradigm.

Image-based rendering (IBR):
DynIBaR builds on an image-based rendering (IBR) method called IBRNet, which was designed for view synthesis in static scenes. IBR methods recognize that a new target view should be similar to nearby source images and synthesize the target view by dynamically selecting and warping pixels from the nearby source frames, rather than reconstructing the entire scene in advance. IBRNet learns to blend nearby images together within a volumetric rendering framework to recreate new views of a scene.

Extending IBR to complex, dynamic videos:
To extend IBR to dynamic scenes, we need to consider scene motion during rendering. In DynIBaR, as part of reconstructing an input video, we solve for the motion of every 3D point using a motion trajectory field encoded by an MLP. Unlike prior methods, which store the entire scene appearance and geometry in an MLP, DynIBaR only stores motion, which is more smooth and sparse. The input video frames are used to determine everything else needed for rendering new views.

We optimize DynIBaR by taking each input video frame, rendering rays to form a 2D image using volume rendering, and comparing it to the input frame. This optimized representation should be able to perfectly reconstruct the input video. To obtain high-quality results, we introduce techniques such as cross-time rendering and factorizing the scene into static and dynamic components.

Creating video effects:
DynIBaR enables various video effects. For example, it can stabilize shaky input videos, perform simultaneous view synthesis and slow motion, and generate high-quality video bokeh with dynamically changing depth of field.

Conclusion:
DynIBaR represents a significant advancement in rendering complex moving scenes from new camera paths. While it currently involves per-video optimization, we envision faster versions that can be deployed on in-the-wild videos, enabling new effects for consumer video editing using mobile devices.

Acknowledgements:
DynIBaR is the result of a collaboration between researchers at Google Research and Cornell University. The key contributors to this work include Zhengqi Li, Qianqian Wang, Forrester Cole, Richard Tucker, and…

Source link