This paper introduces the Embedding Pose Graph (EPG), a novel approach that combines foundational models with a simple 3D representation tailored for robotics applications. EPG addresses the need for efficient spatial comprehension in robotics by merging foundation model features with the nodes of a pose graph. In contrast to conventional methods reliant on bulky data formats like voxel grids or point clouds, EPG is lightweight and adaptable. It supports a variety of robotic tasks such as open-vocabulary querying, disambiguation, image-based querying, language-directed navigation, and re-localization in 3D environments. We highlight EPG’s effectiveness in performing these tasks, showcasing its ability to enhance robots’ interactions and navigation in complex spaces. Through qualitative and quantitative evaluations, we demonstrate EPG’s strong performance and its superiority over existing methods in re-localization. This work marks a significant advancement in enabling robots to efficiently comprehend and operate within large-scale 3D environments.