In various computer vision applications, such as augmented reality and self-driving cars, determining the distance between objects and the camera is crucial. One technique used for this purpose is depth from focus/defocus, which utilizes image blur as a key factor. Typically, depth from focus/defocus involves capturing a series of images of the same scene with varying focus distances, a method known as focal stack.
Over the past decade, scientists have proposed numerous methods for depth from focus/defocus, which can be categorized into two main groups. The first group consists of model-based methods that rely on mathematical and optics models to estimate scene depth based on sharpness or blur. However, these methods tend to fail on texture-less surfaces that appear uniform throughout the focal stack.
The second group includes learning-based methods, which can be trained to efficiently perform depth from focus/defocus, even on texture-less surfaces. However, these approaches may struggle when the camera settings in the input focal stack differ from those in the training dataset.
To address these limitations, a team of researchers from Japan has developed an innovative method called deep depth from focal stack (DDFS). This technique combines model-based depth estimation with a learning framework to provide a comprehensive solution. DDFS creates a “cost volume” based on the input focal stack, camera settings, and a lens defocus model to represent depth hypotheses for each pixel, allowing for depth estimation with different camera settings during training and testing.
DDFS also utilizes an encoder-decoder network to progressively estimate scene depth in a coarse-to-fine manner, employing cost aggregation at each stage to adaptively learn localized structures in the images.
In comparative experiments with other depth from focus/defocus methods, DDFS demonstrated superior performance across various metrics and image datasets. The method even showed promise with a limited number of input images in the focal stack, unlike other techniques.
Overall, DDFS shows potential for applications requiring depth estimation, such as robotics, autonomous vehicles, 3D image reconstruction, and virtual/augmented reality. The method’s camera-setting invariance can expand the applicability of learning-based depth estimation techniques, offering new possibilities for computer vision systems.
This study could pave the way for more advanced computer vision systems in the future.