Feature Contribution in Monocular Depth Estimation

Abstract

Monocular Depth Estimation (MDE) is an inherently ill-posed problem due to the lack of binocular depth cues, despite this there has been significant progress made in this field in recent years. In an attempt to bridge understanding between human and machine perception, this paper investigates learned concepts from the general-purpose model Depth Anything, focusing on features that are known to be present in the human visual system. We perform interventions on different image features within the KITTI and NYUv2 dataset, evaluating performance on these intervened inputs. This led to interesting insights on the way and amount each of these features influences depth perception. These insights contributes to bridging understanding of how humans and machines perform MDE respectively, but we also hope it provides new ways for future work to devise more robust methods of training neural networks on MDE.

Publication
Proceedings Track