lambdarepresentation.github.io

A State Representation for Diminishing Rewards

Here we present GIFs of agents trained using the λR across several navigation domains. A consistent theme is that accurate policy evaluation using the ``correct’’ λ leads to stronger performance. Using a λ which is too high typically leads to overstaying at reward locations, while a λ which is too low leads the agent to abandon rewarded locations too early.