DSERT-RoLL: Robust Multi-Modal Perception for Diverse Driving Conditions with Stereo Event-RGB-Thermal Cameras, 4D Radar, and Dual-LiDAR

Wonyoung Lee, Youngho Kim, Kuk-Jin Yoon
VILAB, KAIST
CVPR 2026
*Indicates Equal Contribution

Abstract

In this paper, we present DSERT-RoLL, a driving dataset that incorporates stereo event, RGB, and thermal cameras together with 4D radar and dual LiDAR, collected across diverse weather and illumination conditions. The dataset provides precise 2D and 3D bounding boxes with track IDs and ego vehicle odometry, enabling fair comparisons within and across sensor combinations. It is designed to alleviate data scarcity for novel sensors such as event cameras and 4D radar and to support systematic studies of their behavior. We establish unified 3D and 2D benchmarks that enable direct comparison of characteristics and strengths across sensor families and within each family. We report baselines for representative single modality and multimodal methods and provide protocols that encourage research on different fusion strategies and sensor combinations. In addition, we propose a fusion framework that integrates sensor specific cues into a unified feature space and improves 3D detection robustness under varied weather and lighting.

Sensor Suite

Sensor setup
2D Sensors Model Name Specification FoV FPS
RGB 2 × BFS-U3-51S5C 2448 × 2048 82.2° × 66.5° 10
Event 2 × Prophesee EVK4 1280 × 720 76.7° × 65.5° >10k
Thermal 2 × FLIR A65 640 × 512 90° × 69° 30
3D Sensors Model Name Specification FoV FPS
4D Radar RETINA-4FN 100m 100° × 24° 20
Long-range LiDAR Livox HAP 150m 120° × 25° 10
Short-range LiDAR os0-128 100m 360° × 90° 20
GPS/IMU Microstrain 3DM-GX5-45 N/A N/A 10/100

Dataset Comparison

Dataset Num Data Adverse Weather 3D Range Sensor Camera Sensor Ground-truth
Clear Rain Fog Snow LiDAR Radar RGB Event Thermal 3D Bbox. Tr. ID Odom
KITTI 15k Stereo
Waymo 230k Multi-view
nuScenes 40k 3D Multi-view
Argoverse 2 150k Multi-view
K-Radar 35k 4D Multi-view
TJ4DRadSet 7.8k Mono
DSEC 5.4k Stereo Stereo
1Mpx 32M Mono Mono
SeeingThroughFog 13.5k 3D Stereo Mono
KAIST 8.9k Stereo Mono
DSERT-RoLL (Ours) 22k 4D Stereo Stereo Stereo

Weather Conditions

Light Conditions

BibTeX


        @inproceedings{cho2026dsertroll,
          title={DSERT-RoLL: Robust Multi-Modal Perception for Diverse Driving Conditions with Stereo Event-RGB-Thermal Cameras, 4D Radar, and Dual-LiDAR},
          author={Cho, Hoonhee and Kang, Jae-Young and Jeong, Yuhwan and Yang, Yunseo and Lee, Wonyoung and Kim, Youngho and Yoon, Kuk-Jin},
          booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
          year={2026}
        }