Q-Learning

Q-Learning

Q-Learning is a model-free, value-based, and off-policy RL algorithm for finding the best policy based on the agent’s current state

Agent uses a Q-table to take the best possible action based on expected reward for each state. It is a table of actions and states. Q-function is the Bellman equation

Limitations

Traditional Q-learning suffers from overestimation bias ← addressed using Double Q-Learning, which uses two separate Q-values and takes the minimum of the two

Randomized Ensemble Double Q-learning (REDQ) is another improvement

Double Q-learning to counter overestimation bias
Randomized ensembles (different models) to encourage thorough exploration

✳️ Zero Space

Explorer

Q-Learning

Limitations

Graph View

Backlinks