Humans have the ability of peripheral vision, allowing them to see shapes outside their direct line of sight, although with less detail. This skill widens their field of vision and can be useful in various situations, such as spotting a car approaching from the side.
AI, on the other hand, lacks peripheral vision. Giving computer vision models this capability could enhance their ability to detect potential hazards or predict if a human driver would notice an oncoming object.
To address this, MIT researchers created an image dataset to simulate peripheral vision in machine learning models. Training models with this dataset improved their object detection in the visual periphery, although they still fell short of human performance.
The study also revealed that factors like object size and visual clutter did not significantly affect the AI’s performance, unlike in humans.
Researchers are now exploring what is missing in AI models to make them more human-like in their vision. This understanding could lead to improved driver safety and user interfaces that are easier to navigate.
Moreover, modeling peripheral vision in AI could help predict human behavior more accurately, which has implications beyond driver safety.
The study, with co-authors from various institutions, will be presented at the International Conference on Learning Representations.
By simulating peripheral vision, researchers aim to replicate how humans perceive the world beyond their central focus. Using a modified technique called the texture tiling model, they generated a dataset of transformed images to mimic the information loss in human peripheral vision.
Training computer vision models with this dataset showed improvements in object detection, although the models still lagged behind human performance. The researchers plan to continue exploring these differences to develop AI systems that can predict human performance in the visual periphery.
This research, supported by the Toyota Research Institute and the MIT CSAIL METEOR Fellowship, aims to bridge the gap between human and AI vision capabilities.