Active Vision Reinforcement Learning with Limited Visual Observability

Stony Brook University

Learned policies on Atari

Left: partial observation used; Right: full game

For wiping the table, we observe two types of active camera motion:

Videos 1-2: maintaining the location of the end effector in the frame (i.e. syncing with the robot motion).
Videos 3-5: moving closer to the table and moving up to observe the table top.

Top: used observation from the active camera view; Bottom: the same recording but from another static view for better visualization

We also observe roughly two kinds of sensory policies can be obtained for completing the task:

Top: used observation from the active camera view; Bottom: the same recording but from another static view for better visualization

We observe that the sensory policy can seek for robot end-effector although some tasks are failed.

Top: used observation from the active camera view; Bottom: the same recording but from another static view for better visualization