Active offline policy selection

      To make RL more applicable to real-world applications like robotics, we propose using an intelligent evaluation procedure to select the policy for deployment, called active offline policy selection (A-OPS). In A-OPS, we make use of the prerecorded dataset and allow limited interactions with the real environment to boost the selection quality. Read More DeepMind Blog 

​  


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *