Specification Gaming

Specification Gaming is the behaviour of satisfying the specification of an objective without actually completing the intended task

Cheating or finding shortcuts is a good and simple example of this in real life

Why does this happen?

Misspecification - This isn’t a flaw in the RL algorithm. Rather, it’s because the objective itself is not specified correctly
Poorly designed reward shaping - Shaping rewards can change the optimal policy, sometimes for the worse. This happens when they’re not potential-based
Incorrect assumptions - A common form of incorrect assumptions is that agents cannot exploit bugs that are exposed during simulations in the real world. This is a problem especially when designers consider the specified tasks to be immune to tampering → Reward tampering problem

Note

Specification gaming, while usually thought as unintended or bad, can actually be a good sign in some cases. Such behaviours may encourage the system to learn novel and innovate methods to achieve the objective. After all, there can be multiple solutions to a given problem

Task Specification - Reward function design, training environments, auxiliary rewards etc. More of a catch-all term

Correctness of task specification ⇐> Intended outcome

Challenges summarized

How do we avoid reward tampering?
How do we faithfully capture human concepts?
How do we avoid making incorrect assumptions?

Important Takeaways

Always try to properly specify the intent
A combination of RL Algorithm Design and Reward Design is crucial
Don’t try to cover every possible case while specifying - Learn the reward function from human feedback

✳️ Zero Space

Explorer

Specification Gaming

Why does this happen?

Challenges summarized

Important Takeaways

Graph View

Table of Contents

Backlinks