CM-DQN: A Value-Based Deep Reinforcement Learning Model to Simulate Confirmation Bias

Jiacheng shen; Lihan Feng

doi:10.47191/etj/v9i07.31

Authors

Jiacheng shen New York University Shanghai Shanghai, China
Lihan Feng New York University Shanghai Shanghai, China

Vol. 9 No. 7 (2024): VOLUME 09 ISSUE 07

Articles

Accepted July 18, 2024

Published July 31, 2024

Downloads

PDF

Abstract
How to Cite
Metrics
References

In human decision-making tasks, individuals learn through trials and prediction errors. When individuals learn the task, some are more influenced by good outcomes, while others weigh bad outcomes more heavily. Such confirmation bias can lead to different learning effects. In this study, we propose a new algorithm in Deep Reinforcement Learning, CM-DQN, which applies the idea of different update strategies for positive or negative prediction errors, to simulate the human decision-making process when the task's states are continuous while the actions are discrete. We test in Lunar Lander environment with confirmatory, disconfirmatory bias and non-biased to observe the learning effects. Moreover, we apply the confirmation model in a multi-armed bandit problem (environment in discrete states and discrete actions), which utilizes the same idea as our proposed algorithm, as a contrast experiment to algorithmically simulate the impact of different confirmation bias in decision-making process. In both experiments, confirmatory bias indicates a better learning effect.

Barron, E., & Ishii, H. (1989). The bellman equation for min-imizing the maximum cost. NONLINEAR ANAL. THEORY METHODS APPLIC., 13(9), 1067–1090.

Hester, T., Vecerik, M., Pietquin, O., Lanctot, M., Schaul, T., Piot, B., . . . others (2018). Deep q-learning from demon-strations. In Proceedings of the aaai conference on artifi-cial intelligence (Vol. 32).

Lefebvre, G., Summerfield, C., & Bogacz, R. (2022). A nor-mative account of confirmation bias during reinforcement learning. Neural computation, 34(2), 307–337.

Lv, P., Wang, X., Cheng, Y., & Duan, Z. (2019). Stochastic double deep q-network. IEEE Access, 7, 79446–79454.

Palminteri, S. (2023). Choice-confirmation bias and gradual perseveration in human reinforcement learning. Behavioral Neuroscience, 137(1), 78.

Palminteri, S., Lefebvre, G., Kilford, E. J., & Blakemore, S.-J. (2017). Confirmation bias in human reinforcement learning: Evidence from counterfactual feedback process-ing. PLoS computational biology, 13(8), e1005684.

Peters, U. (2022). What is the function of confirmation bias? Erkenntnis, 87(3), 1351–1376. Retrieved from https:// doi.org/10.1007/s10670-020-00252-1 doi: 10.1007/ s10670-020-00252-1

Rosenbaum, G. M., Grassie, H. L., & Hartley, C. A. (2022, jan). Valence biases in reinforcement learning shift across adolescence and modulate subsequent memory. eLife, 11, e64620. Retrieved from https://doi.org/10.7554/ eLife.64620 doi: 10.7554/eLife.64620

Watkins, C. J., & Dayan, P. (1992). Q-learning. Machine learning, 8, 279–292.

Zimper, A., & Ludwig, A. (2009). On attitude polarization under bayesian learning with non-additive beliefs. Journal of Risk and Uncertainty, 39(2), 181–212. Retrieved from https://doi.org/10.1007/s11166-009-9074-0 doi: 10.1007/s11166-009-9074-0

CM-DQN: A Value-Based Deep Reinforcement Learning Model to Simulate Confirmation Bias

Authors

Downloads

Make a Submission

author_desk

sidebarmenu

Current Issue

Information

Browse

Author Info.:

Contact Info: