Tag
1 articles
This paper shows how to start RL from a working baseline policy and gradually hand control to a learned policy.