David Lillis: Spatiotemporal Object Detection and Activity Recognition

Spatiotemporal Object Detection and Activity Recognition

Vimal Kumar, Shobhit Jain and David Lillis

In J. A, S. Abimannan, E.-S. M. El-Alfy, and Y.-S. Chang, editors, Spatiotemporal Data Analytics and Modeling: Techniques and Applications, pages 115--132. Springer Nature, Singapore, 2024.

Abstract

Spatiotemporal object detection and activity recognition are essential components in the advancement of computer vision, with broad applications spanning surveillance, autonomous driving, and smart stores. This chapter offers a comprehensive overview of the techniques and applications associated with these concepts. Beginning with an introduction to the fundamental principles of object detection and activity recognition, we discuss the challenges and limitations posed by existing methods. The chapter progresses to explore spatiotemporal object detection and activity recognition, which entails capturing spatial and temporal information of moving objects in video data. A hierarchical model for spatiotemporal object detection and activity recognition is proposed, designed to maintain spatial and temporal connectivity across frames. Additionally, the chapter outlines various metrics for evaluating the performance of object detection and activity recognition models, ensuring their accuracy and effectiveness in real-world applications. Finally, we underscore the significance of spatiotemporal object detection and activity recognition in diverse fields such as surveillance, autonomous driving, and smart stores, emphasizing the potential for further research and development in these areas. In summary, this chapter provides a thorough examination of spatiotemporal object detection and activity recognition, from the foundational concepts to the latest techniques and applications. By presenting a hierarchical model and performance evaluation metrics, the chapter serves as a valuable resource for researchers and practitioners seeking to harness the power of computer vision in a variety of domains.