TechRxiv

Temporal Early Exits for Efficient Video Object Detection

Download (16.82 MB)
Version 2 2023-12-04, 03:41
Version 1 2023-11-29, 19:14
preprint
posted on 2023-12-04, 03:41 authored by Amin Sabet, Lei Xun, Jonathon Hare, Bashir M. Al-Hashimi, Geoff V. Merrett

Efficiently transferring image-based object detectors to the domain of video remains challenging under resource constraints. Previous efforts used feature propagation to avoid recomputing unchanged features. However, the overhead is significant when working with very slowly changing scenes, such as in surveillance applications. In this paper, we propose temporal early exits to reduce the computational complexity of video object detection. Multiple temporal early exit modules with low computational overhead are inserted at early layers of the backbone network to identify the semantic differences between consecutive frames. Full computation is only required if the frame is identified as having a semantic change to previous frames; otherwise, detection results from previous frames are reused. Experiments on ImangeNet VID and TVnet show that the approach can accelerate video object detection by 1.7x compared to SOTA, with a reduction of only <1% in mAP.

Funding

Centre for Spatial Computational Learning

Engineering and Physical Sciences Research Council

Find out more...

History

Email Address of Submitting Author

l.xun@soton.ac.uk

ORCID of Submitting Author

0000-0003-0765-6726

Submitting Author's Institution

University of Southampton

Submitting Author's Country

  • United Kingdom

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC