Adaptive moving average Q-learning

doi:10.1007/s10115-024-02190-8

CSpace

	Adaptive moving average Q-learning
	Tan, Tao 1; Xie, Hong 1; Xia, Yunni 2; Shi, Xiaoyu3 ; Shang, Mingsheng3
	2024-08-12
摘要	A variety of algorithms have been proposed to address the long-standing overestimation bias problem of Q-learning. Reducing this overestimation bias may lead to an underestimation bias, such as double Q-learning. However, it is still unclear how to make a good balance between overestimation and underestimation. We present a simple yet effective algorithm to fill in this gap and call Moving Average Q-learning. Specifically, we maintain two dependent Q-estimators. The first one is used to estimate the maximum expected Q-value. The second one is used to select the optimal action. In particular, the second estimator is the moving average of historical Q-values generated by the first estimator. The second estimator has only one hyperparameter, namely the moving average parameter. This parameter controls the dependence between the second estimator and the first estimator, ranging from independent to identical. Based on Moving Average Q-learning, we design an adaptive strategy to select the moving average parameter, resulting in AdaMA (Adaptive Moving Average) Q-learning. This adaptive strategy is a simple function, where the moving average parameter increases monotonically with the number of state-action pairs visited. Moreover, we extend AdaMA Q-learning to AdaMA DQN in high-dimensional environments. Extensive experiment results reveal why Moving Average Q-learning and AdaMA Q-learning can mitigate the overestimation bias, and also show that AdaMA Q-learning and AdaMA DQN outperform SOTA baselines drastically. In particular, when compared with the overestimated value of 1.66 in Q-learning, AdaMA Q-learning underestimates by 0.196, resulting in an improvement of 88.19%.
关键词	Moving Average Q-learning AdaMA Q-learning AdaMA DQN
DOI	10.1007/s10115-024-02190-8
发表期刊	KNOWLEDGE AND INFORMATION SYSTEMS
ISSN	0219-1377
页码	29
通讯作者	Xie, Hong(xiehong2018@foxmail.com)
收录类别	SCI
WOS记录号	WOS:001289423300002
语种	英语

中国科学院重庆绿色智能技术研究院机构知识库