KMS Chongqing Institute of Green and Intelligent Technology, CAS
Adaptive moving average Q-learning | |
Tan, Tao1; Xie, Hong1; Xia, Yunni2; Shi, Xiaoyu3; Shang, Mingsheng3 | |
2024-08-12 | |
摘要 | A variety of algorithms have been proposed to address the long-standing overestimation bias problem of Q-learning. Reducing this overestimation bias may lead to an underestimation bias, such as double Q-learning. However, it is still unclear how to make a good balance between overestimation and underestimation. We present a simple yet effective algorithm to fill in this gap and call Moving Average Q-learning. Specifically, we maintain two dependent Q-estimators. The first one is used to estimate the maximum expected Q-value. The second one is used to select the optimal action. In particular, the second estimator is the moving average of historical Q-values generated by the first estimator. The second estimator has only one hyperparameter, namely the moving average parameter. This parameter controls the dependence between the second estimator and the first estimator, ranging from independent to identical. Based on Moving Average Q-learning, we design an adaptive strategy to select the moving average parameter, resulting in AdaMA (Adaptive Moving Average) Q-learning. This adaptive strategy is a simple function, where the moving average parameter increases monotonically with the number of state-action pairs visited. Moreover, we extend AdaMA Q-learning to AdaMA DQN in high-dimensional environments. Extensive experiment results reveal why Moving Average Q-learning and AdaMA Q-learning can mitigate the overestimation bias, and also show that AdaMA Q-learning and AdaMA DQN outperform SOTA baselines drastically. In particular, when compared with the overestimated value of 1.66 in Q-learning, AdaMA Q-learning underestimates by 0.196, resulting in an improvement of 88.19%. |
关键词 | Moving Average Q-learning AdaMA Q-learning AdaMA DQN |
DOI | 10.1007/s10115-024-02190-8 |
发表期刊 | KNOWLEDGE AND INFORMATION SYSTEMS |
ISSN | 0219-1377 |
页码 | 29 |
通讯作者 | Xie, Hong(xiehong2018@foxmail.com) |
收录类别 | SCI |
WOS记录号 | WOS:001289423300002 |
语种 | 英语 |