煤矿井下掘进机器人路径规划方法研究

张旭辉; 郑西利; 杨文娟; 李语阳; 麻兵; 董征; 陈鑫

doi:10.12363/issn.1001-1986.23.11.0748

摘要: 针对煤矿非全断面巷道条件下掘进机器人移机难度大、效率低下等问题，分析了煤矿井下非结构化环境特征及掘进机器人运动特性，提出了基于深度强化学习的掘进机器人机身路径规划方法。利用深度相机将巷道环境实时重建，在虚拟环境中建立掘进机器人与巷道环境的碰撞检测模型，并使用层次包围盒法进行虚拟环境碰撞检测，形成巷道边界受限下的避障策略。考虑到掘进机器人形体大小且路径规划过程目标单一，在传统SAC算法的基础上引入后见经验回放技术，提出HER-SAC算法，该算法通过环境初始目标得到的轨迹扩展目标子集，以增加训练样本、提高训练速度。在此基础上，基于奖惩机制建立智能体，根据掘进机器人运动特性定义其状态空间与动作空间，在同一场景下分别使用3种算法对智能体进行训练，综合平均奖励值、最高奖励值、达到最高奖励值的步数以及鲁棒性4项性能指标进行对比分析。为进一步验证所提方法的可靠性，采用虚实结合的方式，通过调整目标位置设置2种实验场景进行掘进机器人的路径规划，并将传统SAC算法和HER-SAC算法的路径结果进行对比。结果表明：相较于PPO算法和SAC算法，HER-SAC算法收敛速度更快、综合性能达到最优；在2种实验场景下，HER-SAC算法相比传统SAC算法规划出的路径更加平滑、路径长度更短、路径终点与目标位置的误差在3.53 cm以内，能够有效地完成移机路径规划任务。该方法为煤矿掘进机器人的自主移机控制奠定了理论基础，为煤矿掘进设备自动化提供了新方法。

Abstract: In order to solve the problems of difficulty and low efficiency in the movement of robotic roadheaders under conditions of non-full-section roadways in coal mines, the characteristics of unstructured environments in coal mines and the motion characteristics of robotic roadheaders were analyzed, and a path planning method for robotic roadheaders based on deep reinforcement learning was proposed. The tunnel environment was constructed in real time using depth cameras, a virtual model for detecting roadheader-tunnel collisions was established, collision detection was performed in a virtual environment using the hierarchical bounding box method, and an obstacle avoidance strategy under the restrictions of tunnel boundary was developed. Considering the size of the roadheader robot and the single goal in the path planning process, the HER-SAC algorithm was proposed based on the traditional SAC algorithm by introducing the retrospective experience playback technology. The algorithm expands the target subset through the trajectory obtained by the initial target in the environment to increase training samples and training speed. On this basis, an agent was established based on the reward and punishment mechanism, and its state space and action space were defined according to the motion characteristics of the roadheader robot. The agent was trained using three algorithms under the same scenario, and the performances of these algorithms were comparatively analyzed using four indicators, namely, the average reward value, the maximum reward value, the number of steps to reach the maximum reward value, and robustness. In order to further verify the reliability of the proposed method, a virtual-real combination method was adopted, roadheader path planning was performed in two experimental scenarios set by adjusting the target position, and the results produced by the traditional SAC algorithm and the HER-SAC algorithm were compared. The results show that the HER-SAC algorithm converges faster and generally performs better than the PPO and SAC algorithms; in the two experimental scenarios, the path planned by the HER-SAC algorithm is smoother and shorter than that planned by the traditional SAC algorithm, and the error between the end point of the path planned by the HER-SAC algorithm and the target position is less than 3.53 cm, indicating that the HER-SAC algorithm can effectively execute and complete path planning tasks. This study lays a theoretical foundation for autonomous transfer control of roadheader robots and provides a new approach to the automation of coal mining equipment.

煤矿井下掘进机器人路径规划方法研究

Research on path planning methods for underground roadheader robots