Mark Matthews, a Senior Software Engineer, and Dmitry Lagun, a Research Scientist from Google Research, discuss the challenges of reconstructing objects in 3D from a few images and the development of a new technique called MELON that addresses this problem.
The ability for a computer to reconstruct an object in 3D from only a few images has been a difficult algorithmic problem for years. This task, known as pose inference, is crucial for applications like e-commerce 3D models and autonomous vehicle navigation. Determining the exact positions from which images were taken is a key part of the problem.
Pseudo-symmetries, where objects look similar from different angles, make the problem even more challenging. Techniques like neural radiance fields and 3D Gaussian Splatting can reconstruct objects in 3D if camera poses are known, but without this information, the problem becomes much more complex.
MELON is a new technique that can determine object-centric camera poses and reconstruct objects in 3D without requiring initial pose estimates or complex training schemes. By leveraging a lightweight CNN encoder and a modulo loss that considers pseudo symmetries, MELON can achieve state-of-the-art accuracy with as few as 4-6 images of an object.
Results show that MELON quickly converges to accurate camera poses and achieves competitive rendering quality on the NeRF Synthetic dataset. The technique also works well with noisy, unposed images, demonstrating its robustness in challenging conditions.
MELON represents a significant advancement in the field of 3D object reconstruction and has the potential to be integrated into existing NeRF methods. Further research is being conducted to adapt MELON for real-world applications.
Source link