Active Vision Reinforcement Learning with Limited Visual Observability

Jinghuan Shang and Michael Ryoo

Stony Brook University

Learned policies on Atari

Left: partial observation used; Right: full game

Learned policies on Robosuite

Wiping the table

For wiping the table, we observe two types of active camera motion:

Top: used observation from the active camera view; Bottom: the same recording but from another static view for better visualization

Door opening

We also observe roughly two kinds of sensory policies can be obtained for completing the task:

Top: used observation from the active camera view; Bottom: the same recording but from another static view for better visualization

Failure cases

We observe that the sensory policy can seek for robot end-effector although some tasks are failed.

Top: used observation from the active camera view; Bottom: the same recording but from another static view for better visualization