Srujan ponnapalli
Computer vision is a growing field of artificial intelligence that trains computers to interpret and understand the visual world by decoding data and inferring properties from images. From facial recognition to self-driving cars, computer vision has enabled machines to perform tasks that were once only possible for humans.
Computer vision has a long history of research and development dating back to the 1960s at MIT, when pioneers of artificial intelligence sought to mimic the human visual system. The intended goal of such projects was to enable computers to “describe what they saw” from digital images or videos. However, this proved to be a much harder problem than anticipated, requiring significant strides in mathematics, physics, statistics, and learning theory that ultimately slowed interest in the field. Computer vision has since experienced a resurgence. In 2012, AlexNet, a convolutional neural network (CNN) architecture that spurred thousands of papers that employed CNNs to accelerate deep learning, revolutionized the field of computer vision. Three years later, YOLO (You Only Look Once) was publicized, providing a real-time approach to object detection that utilized predictions to classify various objects within images in a single evaluation. Since then, computer vision has grown into one of the most promising fields in artificial intelligence today, with continuous developments that further the capabilities of image processing models every day.
A research group from UCLA and the United States Army Research Laboratory have proposed a new approach to enhance computer vision technologies by adding physics-based awareness to data-driven techniques. Their paper “Incorporating physics into data-driven computer vision”, published in the journal Nature Machine Intelligence, offers an overview of a hybrid methodology that aims to improve how AI-based machines sense, interact and respond to their environment in real time– as in how autonomous vehicles move and maneuver, or how robots use the improved technology to carry out precision actions.
Traditional computer vision techniques allow models to process visual surroundings through data-driven machine learning. While such deep learning-based techniques provide performance advantages over physics-based vision, they neglect the physical aspects of image perception that are critical to better understanding the image. Physics-based research has, in its own field, been developed to explore the various physical principles behind many computer vision challenges that go beyond simple visual attributes, such as motion estimation, depth reconstruction, and illumination modeling. The researchers suggest that combining the two approaches of physics and data can lead to more robust and accurate computer vision systems that combine the performance of a data-driven method with the practicality of a physics-based method to their fullest advantage. They argue that physics can provide prior knowledge, constraints, and regularization for data-driven models, while data can provide empirical evidence, variability, and scalability for physics-based models.
The researchers focused on three primary ways in which physics can be integrated into computer vision. Firstly, by tagging objects in datasets with additional information, such as how fast they can move or how much they weigh, AI can better learn the physical properties and dynamics of objects from data. Next, by running data through a network filter that codes physical properties into what cameras pick up, AI can infer physical attributes and relations from images on a deeper level. Lastly, by leveraging knowledge built on physics to help AI interpret training data on what it observes, AI can optimize its performance based on physical criteria and metrics. This novel integration of physics into computer vision can offer tangible benefits in various real-world applications. In the field of autonomous drones, AI equipped with physics-aware computer vision can more accurately predict and navigate through complex aerodynamic conditions, allowing drones to perform critical tasks efficiently, such as search and rescue missions in challenging terrains or delivering medical supplies to remote areas. Additionally, in industrial automation, physics-based computer vision can enable machines to assess physical properties like material hardness or structural integrity, optimizing quality control processes and ensuring the production of reliable and high-quality products.
The team found that these three techniques of integrating physics into computer vision techniques yielded positive results and improved metrics, pushing the field of computer vision in a promising direction that can enhance the performance, robustness, and interpretability of AI systems. By combining the strengths of both modalities, physics-based learning can overcome some of the limitations and challenges of traditional computer vision techniques, such as data scarcity, domain adaptation, and generalization. Moreover, physics-based learning can enable new applications and discoveries that are beyond the reach of purely data-driven or purely physics-based methods. As computer vision continues to evolve and expand its scope and impact, physics-based learning will undoubtedly play a vital role in advancing the field and bridging the gap between artificial and natural intelligence.
References:
Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, (2012) 'ImageNet Classification with Deep Convolutional Neural Networks.' NIPS 2012, 25. https://proceedings.neurips.cc/paper_files/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html [accessed July 20, 2023].
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2015) 'You Only Look Once: Unified, Real-Time Object Detection.' arXiv. https://doi.org/10.48550/ARXIV.1506.02640 [accessed July 20, 2023].
Kadambi, A., de Melo, C., Hsieh, CJ. et al. (2023) 'Incorporating physics into data-driven computer vision.' Nature Machine Intelligence, 5, 572–580. https://doi.org/10.1038/s42256-023-00662-0 [accessed July 20, 2023].