An adaptive adjustment strategy for bolt posture errors based on an improved reinforcement learning algorithm

Designing an intelligent and autonomous system remains a great challenge in the assembly field. Most reinforcement learning (RL) methods are applied to experiments with relatively small state spaces. However, the complicated situation and high-dimensional spaces of the assembly environment cause tra...

Full description

Saved in:

Bibliographic Details
Published in:	Applied intelligence (Dordrecht, Netherlands) Vol. 51; no. 6; pp. 3405 - 3420
Main Authors:	Luo, Wentao, Zhang, Jianfu, Feng, Pingfa, Liu, Haochen, Yu, Dingwen, Wu, Zhijun
Format:	Journal Article
Language:	English
Published:	New York Springer US 01-06-2021 Springer Nature B.V
Subjects:	Accuracy Algorithms Artificial Intelligence Assembly Computer Science Efficiency Machine learning Machines Manufacturing Mechanical Engineering Model accuracy Optimization Probability theory Processes Adaptive reward mechanism Probabilistic tree Intelligent assembly Model-driven method Reinforcement learning Physical simulation engine
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Designing an intelligent and autonomous system remains a great challenge in the assembly field. Most reinforcement learning (RL) methods are applied to experiments with relatively small state spaces. However, the complicated situation and high-dimensional spaces of the assembly environment cause traditional RL methods to behave poorly in terms of their efficiency and accuracy. In this paper, a model-driven adaptive proximal proximity optimization (MAPPO) method was presented to make the assembly system autonomously rectify the bolt posture error. In the MAPPO method, a probabilistic tree and adaptive reward mechanism were used to improve the calculation efficiency and accuracy of the traditional PPO method. The size of the action space was reduced by establishing a hierarchical logical relationship for each parameter with a probabilistic tree. Based on an adaptive reward mechanism, the phenomenon that the algorithm easily falls into local minima could be improved. Finally, the proposed method was verified based on the Unity simulation engine. The advancement and robustness of the proposed model were also validated by comparing different cases in simulations and experiments. The results revealed that MAPPO has better algorithm efficiency and accuracy compared with other state-of-the-art algorithms.
ISSN:	0924-669X 1573-7497
DOI:	10.1007/s10489-020-01906-x