The Missing Link Between Memory and Reinforcement Learning
Research output: Contribution to journal › Article
Standard
The Missing Link Between Memory and Reinforcement Learning. / Balkenius, Christian; Tjøstheim, Trond A.; Johansson, Birger; Wallin, Annika; Gärdenfors, Peter.
In: Frontiers in Psychology, Vol. 11, 560080, 2020.Research output: Contribution to journal › Article
Harvard
APA
CBE
MLA
Vancouver
Author
RIS
TY - JOUR
T1 - The Missing Link Between Memory and Reinforcement Learning
AU - Balkenius, Christian
AU - Tjøstheim, Trond A.
AU - Johansson, Birger
AU - Wallin, Annika
AU - Gärdenfors, Peter
PY - 2020
Y1 - 2020
N2 - Reinforcement learning systems usually assume that a value function is defined over all states (or state-action pairs) that can immediately give the value of a particular state or action. These values are used by a selection mechanism to decide which action to take. In contrast, when humans and animals make decisions, they collect evidence for different alternatives over time and take action only when sufficient evidence has been accumulated. We have previously developed a model of memory processing that includes semantic, episodic and working memory in a comprehensive architecture. Here, we describe how this memory mechanism can support decision making when the alternatives cannot be evaluated based on immediate sensory information alone. Instead we first imagine, and then evaluate a possible future that will result from choosing one of the alternatives. Here we present an extended model that can be used as a model for decision making that depends on accumulating evidence over time, whether that information comes from the sequential attention to different sensory properties or from internal simulation of the consequences of making a particular choice. We show how the new model explains both simple immediate choices, choices that depend on multiple sensory factors and complicated selections between alternatives that require forward looking simulations based on episodic and semantic memory structures. In this framework, vicarious trial and error is explained as an internal simulation that accumulates evidence for a particular choice. We argue that a system like this forms the “missing link” between more traditional ideas of semantic and episodic memory, and the associative nature of reinforcement learning.
AB - Reinforcement learning systems usually assume that a value function is defined over all states (or state-action pairs) that can immediately give the value of a particular state or action. These values are used by a selection mechanism to decide which action to take. In contrast, when humans and animals make decisions, they collect evidence for different alternatives over time and take action only when sufficient evidence has been accumulated. We have previously developed a model of memory processing that includes semantic, episodic and working memory in a comprehensive architecture. Here, we describe how this memory mechanism can support decision making when the alternatives cannot be evaluated based on immediate sensory information alone. Instead we first imagine, and then evaluate a possible future that will result from choosing one of the alternatives. Here we present an extended model that can be used as a model for decision making that depends on accumulating evidence over time, whether that information comes from the sequential attention to different sensory properties or from internal simulation of the consequences of making a particular choice. We show how the new model explains both simple immediate choices, choices that depend on multiple sensory factors and complicated selections between alternatives that require forward looking simulations based on episodic and semantic memory structures. In this framework, vicarious trial and error is explained as an internal simulation that accumulates evidence for a particular choice. We argue that a system like this forms the “missing link” between more traditional ideas of semantic and episodic memory, and the associative nature of reinforcement learning.
KW - accumulator model
KW - decision making
KW - episodic memory
KW - memory model
KW - semantic memory
U2 - 10.3389/fpsyg.2020.560080
DO - 10.3389/fpsyg.2020.560080
M3 - Article
C2 - 33362625
AN - SCOPUS:85098165382
VL - 11
JO - Frontiers in Psychology
JF - Frontiers in Psychology
SN - 1664-1078
M1 - 560080
ER -