Publications
* denotes equal contribution and joint lead authorship.
2025
-
Polar Hierarchical Mamba: Towards Streaming LiDAR Object Detection with Point Clouds as Egocentric Sequences
Accepted to 4DV @ CVPR 2025, in submission to NeurIPS 2025.
Accurate and efficient object detection is essential for autonomous vehicles, where real-time perception requires low latency and high throughput. LiDAR sensors provide robust depth information, but conventional methods process full 360° scans in a single pass, introducing significant delay. Streaming approaches address this by sequentially processing partial scans in the native polar coordinate system, yet they rely on translation-invariant convolutions that are misaligned with polar geometry -- resulting in degraded performance or requiring complex distortion mitigation. Recent Mamba-based state space models (SSMs) have shown promise for LiDAR perception, but only in the full-scan setting, relying on geometric serialization and positional embeddings that are memory-intensive and ill-suited to streaming. We propose Polar Hierarchical Mamba (PHiM), a novel SSM architecture designed for polar-coordinate streaming LiDAR. PHiM uses local bidirectional Mamba blocks for intra-sector spatial encoding and a global forward Mamba for inter-sector temporal modeling, replacing convolutions and positional encodings with distortion-aware, dimensionally-decomposed operations. PHiM sets a new state-of-the-art among streaming detectors on the Waymo Open Dataset, outperforming the previous best by 10\% and matching full-scan baselines at twice the throughput.
2024
DFDNet: Directional Feature Diffusion for Efficient Fully-Sparse LiDAR Object Detection.
In submission to TMLR.
LiDAR-based object detection is essential for autonomous driving but remains computationally demanding. Conventional methods use dense feature map representations, leading to significant computational overhead and underutilizing the inherent sparsity of LiDAR data. Recent fully sparse detectors show promise but suffers from missing central object features due to the surface-dominant distribution of LiDAR points. Sparse feature diffusion methods attempt to address this by expanding features within object bounding boxes to cover neighboring regions prior to detection head. However, these approaches have excessive computational demands due to need of large diffusion range for larger objects. In this paper, we propose DFDNet, a fully sparse directional feature diffusion network that introduces a novel adaptive sparse feature realignment module that dynamically projects sparse features along object centerlines prior to feature diffusion. This realignment enables efficient, directional feature diffusion along object centerline. The resulting diffused features are then aggregated via max-pooling to construct a refined feature representation for each object. Our method reduces redundant sparse feature computations, achieving a two-fold reduction in computational load while improving performance over state-of-the-art detectors on the Waymo and nuScenes benchmarks.