基于SegFormer-CG的煤矸石识别技术

王妍玮; 陶文彬; 陈凯云; 孟祥林; 张玉

doi:10.13225/j.cnki.jccs.2025.0220

摘要: 煤矸石分选机器人对煤矿智能化发展意义重大，煤矸石识别是煤矸石分选机器人的核心技术，针对传统识别技术在面对高噪声、运动模糊等复杂工况时，存在识别效率低、准确性不足的问题，提出一种基于SegFormer-CG的煤矸石识别技术，以提升识别的实时性和准确率。该模型的编码器采用 SegFormer 的Transformer架构提取多尺度特征，并以轻量级的MiT-B0作为编码器，解码器设计融合模块以增强语义分割性能。在解码器的C1、C2、C3特征图后引入瓶颈模块(Bottleneck)增强模型特征提取能力，并采用深度可分离卷积(Depthwise Separable Convolution, DSConv)与全维度动态卷积(Omni-Dimensional Dynamic Convolution, ODConv)改进瓶颈模块，降低参数量与计算量；同时在C4特征图引入空洞空间金字塔池化(Atrous Spatial Pyramid Pooling, ASPP)模块，并采用深度可分离卷积和5×5卷积对ASPP改进，提升模型多尺度融合能力；在C3、C4特征图后加入交叉注意力机制(Criss-Cross Attention, CCA)使模型聚焦于关键信息，增强模型关键特征提取能力。训练采用2阶段迁移学习策略，先冻结主干网络进行50轮特征适配训练，再解冻全局参数进行优化，有效增强模型对煤矸石图像的泛化能力。结果表明：SegFormer-CG模型的精确率达到96.39%，召回率达到96.29%，平均交并比达到93.03%，相较原模型精确率提升1.32%，召回率提升0.59%，平均交并比提升1.73%。参数量为5.14×10⁶，浮点计算量为5.90×10⁹，帧率为50.92 帧/s。与其他常见模型如PSPNet、DeepLabV3+和UNet对比，SegFormer-CG模型均取得更优秀的识别效果，且在参数量、浮点计算量上都有明显优势，在加噪、运动模糊和低光照的复杂工况下仍保持稳定识别效果，且对新疆、陕西矿区样本具有泛化能力，为选矸机器人高效识别提供了可靠技术支持。

Abstract: The coal-gangue sorting robot plays a vital role in promoting intelligent coal mining, where accurate and efficient coal-gangue recognition is the core technology. However, traditional recognition methods often struggle under complex conditions such as high noise, motion blur, and low illumination, leading to reduced accuracy and efficiency. To address these limitations, we propose a coal-gangue recognition method based on SegFormer-CG, which significantly enhances real-time performance and recognition accuracy. The model adopts the Transformer-based SegFormer framework, utilizing the lightweight MiT-B0 as the encoder to extract multi-scale features. The decoder integrates multiple enhancement modules: Bottleneck modules are introduced after the C1, C2, and C3 feature maps to improve feature extraction, which are further optimized using Depthwise Separable Convolution (DSConv) and Omni-Dimensional Dynamic Convolution (ODConv) to reduce parameter size and computational cost. An Atrous Spatial Pyramid Pooling (ASPP) module, improved with DSConv and 5×5 convolutions, is added to the C4 feature map to enhance multi-scale feature fusion. Additionally, Criss-Cross Attention (CCA) modules are applied to the C3 and C4 feature maps, enabling the model to focus on critical spatial information. A two-stage transfer learning strategy is employed: first, freezing the encoder for 50 epochs to adapt features, and then fine-tuning the entire network to enhance generalization. Experimental results demonstrate that SegFormer-CG achieves 96.39% precision, 96.29% recall, and 93.03% mIoU, with improvements of 1.32%, 0.59%, and 1.73% over the baseline model, respectively. It maintains a lightweight structure with 5.14×10⁶ parameters, 5.90×10⁹ FLOPs, and a high inference speed of 50.92 FPS. Compared with classical models such as PSPNet, DeepLabV3+, and UNet, SegFormer-CG achieves superior performance in both accuracy and efficiency. Furthermore, the model shows strong robustness and generalization under challenging conditions, making it a reliable technical solution for practical coal-gangue sorting.

基于SegFormer-CG的煤矸石识别技术

Coal gangue recognition technology based on SegFormer-CG

相关链接