A platform for computer vision accessibility technology – Google Research Blog

Posted by Dave Hawkey, Software Engineer, Google Research

Two years ago, we introduced Project Guideline, a collaboration between Google Research and Guiding Eyes for the Blind. This project aimed to empower individuals with visual impairments, such as blindness and low-vision, to walk, jog, and run independently. By using just a Google Pixel phone and headphones, Project Guideline utilized on-device machine learning (ML) to guide users along outdoor paths marked with a painted line. This technology has been extensively tested worldwide and was even showcased at the Tokyo 2020 Paralympic Games’ opening ceremony.

Since the initial announcement, our team has been working to enhance Project Guideline by incorporating new features like obstacle detection and advanced path planning. These additions ensure the safe and reliable navigation of users in more complex scenarios, including sharp turns and interactions with nearby pedestrians. The earlier version of the project employed a simple frame-by-frame image segmentation technique to detect the position of the path line within the image frame. While this was effective in orienting the user to the line, it provided limited information about the surrounding environment. Therefore, we needed to improve the navigation signals by developing a better understanding and mapping of the user’s surroundings.

To address these challenges, we created a versatile platform that can be utilized for various spatially-aware applications within the accessibility domain and beyond. Today, we are excited to announce the open-source release of Project Guideline, making it accessible to everyone for further improvement and the development of new accessibility experiences. The release includes the source code for the core platform, an Android application, pre-trained ML models, and a 3D simulation framework.

In terms of system design, the primary use-case is an Android application. However, we aimed to ensure that the core logic could be executed, tested, and debugged in diverse environments in a reproducible manner. Therefore, we designed and built the system using C++, enabling close integration with MediaPipe and other core libraries while still being able to integrate with Android using the Android NDK.

Behind the scenes, Project Guideline employs ARCore to estimate the user’s position and orientation as they navigate the course. A segmentation model, built on the DeepLabV3+ framework, processes each camera frame to generate a binary mask of the guideline. These segmented guideline points are then projected onto a world-space ground plane using the camera pose and lens parameters provided by ARCore. By aggregating the world-space points from multiple frames, we create a virtual mapping of the real-world guideline, allowing for the refinement of the estimated line as the user progresses along the path.

To guide the user, the system utilizes a control system that dynamically selects a target point on the line based on the user’s current position, velocity, and direction. An audio feedback signal is then provided to the user to adjust their heading and align with the upcoming line segment. By utilizing the runner’s velocity vector instead of the camera orientation, we eliminate noise caused by irregular camera movements during running. This approach enables us to navigate the user back to the line even when it is out of the camera’s view, such as when they have overshot a turn. This is possible because ARCore continues to track the camera’s pose, which can be compared to the stateful line map inferred from previous camera images.

Project Guideline also incorporates obstacle detection and avoidance features. We employ an ML model to estimate depth from single images, which is trained using the SANPO dataset consisting of outdoor imagery. The depth maps are converted into 3D point clouds and used to detect obstacles along the user’s path, alerting them through an audio signal.

To provide navigational sounds and cues, we implemented a low-latency audio system based on the AAudio API. Project Guideline offers several sound packs, including a spatial sound implementation using the Resonance Audio API. These sound packs were developed by a team of sound researchers and engineers at Google, utilizing panning, pitch, and spatialization techniques to guide the user along the line. For instance, if a user veers to the right, they may hear a beeping sound in their left ear to indicate that the line is on the left, with the frequency increasing for larger course corrections. If the user veers further, a high-pitched warning sound may indicate that the edge of the path is approaching. Additionally, a clear “stop” audio cue is always available in case the user deviates too far from the line or if any anomalies occur.

Project Guideline has been specifically built for Google Pixel phones with the Google Tensor chip. This chip enables optimized ML models to run on-device with higher performance and lower power consumption, ensuring real-time navigation instructions are provided to the user with minimal delay. On a Pixel 8, running the depth model on the Tensor Processing Unit (TPU) instead of CPU results in a 28x latency improvement, and a 9x improvement compared to GPU.

To facilitate testing and prototyping, Project Guideline includes a simulator that allows for rapid evaluation of the system in a virtual environment. This simulator replicates the full Project Guideline experience, from the ML models to the audio feedback system, without requiring physical hardware or a real-world setup.

Looking to the future, we are excited to collaborate with WearWorks, an early adopter of Project Guideline. WearWorks will integrate their patented haptic navigation experience, providing haptic feedback in addition to sound to guide runners. Their expertise in haptics has already empowered the first blind marathon runner to complete the NYC Marathon without sighted assistance. We believe that such integrations will lead to new innovations and contribute to a more accessible world.

Furthermore, our team is actively working on eliminating the need for a painted line altogether. We aim to leverage the latest advancements in mobile ML technology, such as the ARCore Scene Semantics API, which can identify sidewalks, buildings, and other objects in outdoor scenes. By expanding the capabilities of Project Guideline, we hope to encourage the accessibility community to explore new use cases and continue improving this technology.

We would like to express our gratitude to the many individuals involved in the development of Project Guideline and its underlying technologies. Special thanks to our partners at Guiding Eyes for the Blind and Achilles International, as well as all the team members, contributors, and leaders who have played a role in making this project a reality.

Source link