On the Expressivity of Markov Reward

      Our main results prove that while reward can express many tasks, there exist instances of each task type that no Markov reward function can capture. We then provide a set of polynomial-time algorithms that construct a reward function which allows an agent to optimize tasks of each of these three types, and correctly determine when no such reward function exists. Read More DeepMind Blog 

​  


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *