基于轻量级双目立体匹配的露天矿区空间感知算法

Spatial Perception algorithm for open-pit mining areas based on lightweight stereo matching

  • 摘要: 空间感知是矿山无人驾驶技术中的核心能力之一,能够使系统在复杂的矿山环境中进行导航和避障。现有的基于激光雷达的测距技术成本高昂,数据特征稀疏,实时性差。如何在保证感知精度的同时,降低设备成本和提高响应速度,成为露天矿区无人驾驶感知系统面临的关键问题。基于此,提出了一种轻量级立体匹配框架,以实现高效、准确的实时立体匹配。构建轻量级卷积模块作为主干网络,提出多尺度注意力模块,跨3个不同尺度提取双目图像特征图,利用跳跃连接的上采样模块构建3D代价体。在此基础上,通过深度可分离卷积运算减少传统卷积的参数量,通过包含下采样和上采样的漏斗形模块处理3D代价体以捕获和聚合多尺度上下文语义特征并解码高分辨率的几何细节。在下采样模块中,聚合匹配得到的空间特征,增加感受野并减少计算量,在上采样模块中,恢复特征图中的高分辨率纹理细节。最后,通过视差回归估计连续视差图。为了获取训练数据,在露天矿区数据集AutoMine的基础上利用激光雷达的稀疏点云真值作为监督信号构建了一个大规模的具有密集视差标签的训练数据集。通过与现有的最先进的方法的各种基准测试实验,视差估计误差降低了45.6%,推理时间达到了17 ms,计算量降低到其他方法的42.5%,证明了其在速度、准确性和资源利用率方面的优越性,显示了提出的卷积块通道优化方法的有效性,该轻量级立体匹配网络在矿山数据集AutoMine上实现了轻量级立体匹配方法中的最高精度和最低延迟,显示了在复杂矿山环境下实时立体匹配方面的进步,为算法部署到嵌入式车载设备上运行提供了可能,促进了双目立体匹配算法在露天矿无人驾驶的空间感知和导航避障方面的应用。

     

    Abstract: Spatial perception is a core capability of autonomous driving technology in mining, enabling the system to navigate and avoid obstacles in complex mining environments. Existing LiDAR-based ranging technologies are costly, sparse in data, and have poor real-time performance. The key challenge in open-pit mine autonomous driving systems is to reduce equipment costs and improve response speed while maintaining perception accuracy. To address this, a lightweight stereo-matching framework is proposed for efficient and accurate real-time stereo matching. A lightweight convolutional module is used as the backbone network, with a multi-scale convolutional attention module that extracts stereo image feature maps across three different scales. A skip connection-based upsampling module is used to construct a 3D cost volume. Based on this, depthwise separable convolutions are employed to reduce the parameter scale of traditional convolutions. A funnel-shaped module, which includes both downsampling and upsampling, processes the 3D cost volume to capture and aggregate multi-scale contextual semantic features, decoding high-resolution geometric details. In the downsampling module, the spatial features from the matching process are aggregated, expanding the receptive field and reducing computation. In the upsampling module, high-resolution texture details in the feature maps are restored. Finally, disparity regression is used to estimate continuous disparity maps. To acquire training data, a large-scale training dataset with dense disparity labels was constructed using LiDAR’s sparse point cloud ground truth as a supervision signal, based on the AutoMine open-pit mining dataset. Through various benchmark experiments against state-of-the-art methods, the disparity estimation error was reduced by 45.6%, inference time was reduced to 17 ms, and computational cost was lowered to 42.5% over other methods, proving its superiority in speed, accuracy, and resource efficiency. This demonstrates the effectiveness of the proposed convolution block channel optimization method. The lightweight stereo matching network achieved the highest accuracy and lowest latency in the lightweight stereo matching method on the AutoMine dataset, showing progress in real-time stereo matching for complex mining environments. This advancement enables the deployment of the algorithm on embedded in-vehicle devices, promoting the application of stereo matching algorithms in spatial perception and navigation obstacle avoidance in open-pit mine autonomous driving.

     

/

返回文章
返回