Event cameras offer unparalleled advantages for real-time perception in dynamic environments, thanks to their microsecond-level temporal resolution and asynchronous operation. Existing event-based object detection methods, however, are limited by fixed-frequency paradigms and fail to fully exploit the high-temporal resolution and adaptability of event cameras.
To address these limitations, we propose FlexEvent, a novel event camera object detection framework that enables detection at arbitrary frequencies. FlexEvent consists of two key components: FlexFuser, an adaptive event-frame fusion module that integrates high-frequency event data with rich semantic information from RGB frames, and FAL, a frequency-adaptive learning mechanism that generates frequency-adjusted labels to enhance model generalization across varying operational frequencies. This combination allows FlexEvent to detect objects with high accuracy in both fast-moving and static scenarios, while adapting to dynamic environments.
Extensive experiments on large-scale event camera datasets demonstrate that our approach surpasses state-of-the-art methods, achieving significant improvements in both standard and high-frequency settings. Notably, FlexEvent maintains robust performance when scaling from 20Hz to 90Hz and delivers accurate detection up to 180Hz, proving its effectiveness in extreme conditions. Our framework sets a new benchmark for event-based object detection and paves the way for more adaptable, real-time vision systems.
Our FlexEvent framework is designed to tackle the challenging problem of event camera object detection at arbitrary frequencies. The proposed framework consists of two branches: Event and Frame. The event branch captures high-temporal resolution data, while the frame branch leverages the rich semantic information from frames. These branches are fused dynamically through FlexFuser, allowing adaptive integration of event and frame data. Additionally, the frequency-adaptive learning FAL mechanism ensures robust detection performance across varying operational frequencies. Together, these components enable the model to handle diverse motion dynamics and maintain high detection accuracy in varying frequency scenarios.
We demonstrate that FlexEvent achieves state-of-the-art performance in event-based object detection across large-scale datasets, particularly in high-frequency scenarios, validating its effectiveness and potential to handle safety-critical problems in the real world. The following video demonstrates the consistently superior performance of our approach at different operating frequencies in dynamic environments.
We compare FlexEvent with state-of-the-art event camera detectors on the DSEC-Det dataset. The results demonstrate significant improvements over existing methods in event-based object detection.
Modality | Method | Venue | mAP | AP50 | AP75 | APS | APM | APL |
---|---|---|---|---|---|---|---|---|
E | RVT | CVPR'23 | 38.4% | 58.7% | 41.3% | 29.5% | 50.3% | 81.7% |
SAST | CVPR'24 | 38.1% | 60.1% | 40.0% | 29.8% | 48.9% | 79.7% | |
SSM | CVPR'24 | 38.0% | 55.2% | 40.6% | 28.8% | 52.2% | 77.8% | |
LEOD | CVPR'24 | 41.1% | 65.2% | 43.6% | 35.1% | 47.3% | 73.3% | |
E + F | DAGr-18 | Nature'24 | 37.6% | - | - | - | - | - |
DAGr-34 | Nature'24 | 39.0% | - | - | - | - | - | |
DAGr-50 | Nature'24 | 41.9% | 66.0% | 44.3% | 36.3% | 56.2% | 77.8% | |
FlexEvent | Ours | 57.4% | 78.2% | 66.6% | 51.7% | 64.9% | 83.7% |
The figure below illustrates the comparisons with state-of-the-art event camera detectors on the DSEC-Det dataset.
@misc{lu2024flexeventeventcameraobject,
title={FlexEvent: Event Camera Object Detection at Arbitrary Frequencies},
author={Dongyue Lu and Lingdong Kong and Gim Hee Lee and Camille Simon Chane and Wei Tsang Ooi},
year={2024},
eprint={2412.06708},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2412.06708},
}}