Abstract:
The low level of automation and intelligence of roadway support equipment in coal mine restricts the forming efficiency of coal mine roadway, which is the key reason for “mining imbalance”. In order to solve the problems of low automation and poor support efficiency of coal mine roadway support equipment, a path planning method of drilling and anchoring robot arm based on deep reinforcement learning is proposed for a drilling and anchoring robot arm integrating cantilever road header and multi-degree-of-freedom manipulator. The coal mine roadway environment is constructed in the virtual environment, and the collision detection model of the manipulator and the fuselage, the coal wall and the supporting steel belt is established. The collision detection is carried out in the virtual environment by using the hierarchical bounding box method, and the obstacle avoidance strategy under the condition of limited boundary of the coal mine roadway is formed. Based on the PPO ( Proximal Policy Optimization ) algorithm, combined with various factors, an improvement is proposed. Considering that the state space input length of the multi-degree-of-freedom manipulator is not fixed, the environmental state input processing method of the LSTM ( Long Short Term Memory networks ) neural network is introduced, which can improve the adaptability of the algorithm to the environment. In addition, the ICM ( Intrinsic Curiosity Module) is introduced in the case of sparse rewards and punishments, and the agent is encouraged to explore the environment to a greater extent by giving internal rewards. Based on the reward and punishment mechanism, the agent is established. According to the motion characteristics of the drilling and anchoring robot, its state space and action space are defined. In the same scene, two algorithms are used to train the agent respectively. The comprehensive reward value, round steps, Actor network loss value, Critic network loss value and other indicators are compared and analyzed. Finally, through the simulation ablation experiment test comparison : The experimental results show that when the original PPO algorithm cannot complete the task, the path length of the improved algorithm is 3.98% shorter than that of the PPO-ICM algorithm which can also complete the task, and the time used is shortened by 25.6%. In order to further verify the robustness of the improved algorithm, multiple sets of experiments are designed. The improved PPO algorithm completes the path planning task. The distance error between the path end point and the target position is within 3.88 cm, and the angle error between the bolt and the vertical direction is within 3°. It can effectively complete the path planning task and improve the automation degree of the coal mine roadway support system. The results verify the feasibility and effectiveness of the proposed method in the path planning of the multi-degree-of-freedom manipulator of the drilling and anchoring robot in the case of the changeable position of the anchor hole in the coal mine roadway support.