Publications
* denotes equal contribution and joint lead authorship.
2025
-
Polar Hierarchical Mamba: Streaming LiDAR Object Detection with Point Clouds as Egocentric Sequences
In submission to NeurIPS 2025.
Accurate and efficient object detection is essential for autonomous vehicles, where real-time perception requires low latency and high throughput. LiDAR sensors provide robust depth information, but conventional methods process full 360° scans in a single pass, introducing significant delay. Streaming approaches address this by sequentially processing partial scans in the native polar coordinate system, yet they rely on translation-invariant convolutions that are misaligned with polar geometry -- resulting in degraded performance or requiring complex distortion mitigation. Recent Mamba-based state space models (SSMs) have shown promise for LiDAR perception, but only in the full-scan setting, relying on geometric serialization and positional embeddings that are memory-intensive and ill-suited to streaming. We propose Polar Hierarchical Mamba (PHiM), a novel SSM architecture designed for polar-coordinate streaming LiDAR. PHiM uses local bidirectional Mamba blocks for intra-sector spatial encoding and a global forward Mamba for inter-sector temporal modeling, replacing convolutions and positional encodings with distortion-aware, dimensionally-decomposed operations. PHiM sets a new state-of-the-art among streaming detectors on the Waymo Open Dataset, outperforming the previous best by 10% and matching full-scan baselines at twice the throughput. -
Towards Streaming LiDAR Object Detection with Point Clouds as Egocentric Sequences
Accepted to 4DV @ CVPR 2025.
Accurate and efficient object detection is a crucial component for fully autonomous self-driving. LiDAR sensors are employed to augment or replace cameras for more robustness in diverse driving situations, making object detection on LiDAR point clouds a critical area of research and improvement. Traditional approaches to LiDAR object detection wait for a full 360 degree turn of the scanning sensor before processing the entire point cloud in one go, introducing significant latency and lowering throughput. Previous streaming approaches use the raw LiDAR polar coordinate system to process egocentric partial scans of point clouds, but rely on translation-invariant convolutions, which are incompatible with polar coordinates and lead to performance degradation. In this paper, we show that the reliance on convolutions is not necessary and propose a Mamba-only backbone with Polar Hierarchical Mamba (PHiM) blocks, aggregating per-point features within each partial scan with a local bidirectional state space model and capturing higher-level global features in a streaming fashion with a global forward state space model. Our model on the Waymo Open dataset demonstrates 10% performance improvement from the previous leading polar-based detector, featuring state of the art performance among all polar-based methods while being competitive with existing Cartesian-based detectors with a 2x improvement in processing throughput evaluated as predictions per second.
2024
DFDNet: Directional Feature Diffusion for Efficient Fully-Sparse LiDAR Object Detection.
In submission to TMLR.
LiDAR-based object detection is essential for autonomous driving but remains computationally demanding. Conventional methods use dense feature map representations, leading to significant computational overhead and underutilizing the inherent sparsity of LiDAR data. Recent fully sparse detectors show promise but suffers from missing central object features due to the surface-dominant distribution of LiDAR points. Sparse feature diffusion methods attempt to address this by expanding features within object bounding boxes to cover neighboring regions prior to detection head. However, these approaches have excessive computational demands due to need of large diffusion range for larger objects. In this paper, we propose DFDNet, a fully sparse directional feature diffusion network that introduces a novel adaptive sparse feature realignment module that dynamically projects sparse features along object centerlines prior to feature diffusion. This realignment enables efficient, directional feature diffusion along object centerline. The resulting diffused features are then aggregated via max-pooling to construct a refined feature representation for each object. Our method reduces redundant sparse feature computations, achieving a two-fold reduction in computational load while improving performance over state-of-the-art detectors on the Waymo and nuScenes benchmarks.