Learning View and Target Invariant Visual Servoing
for Navigation


Yimeng Li*
Jana Kosecka
George Mason University

In ICRA 2020

[Paper]
[Video]


In this paper, we study the task of training an agent for short-range navigation to the desired location. The target is either represented as a view image or an image of the target object. In this example, the targeted view is visualized in the first-row first column.


The advances in deep reinforcement learning recently revived interest in data-driven learning based approaches to navigation. In this paper we propose to learn viewpoint invariant and target invariant visual servoing for local mobile robot navigation; given an initial view and the goal view or an image of a target, we train deep convolutional network controller for reaching the goal. We present a new architecture for this task which rests on the ability of establishing correspondences between the initial and goal view and novel reward structure motivated by the traditional feedback control error. The advantage of the proposed model is that it does not require calibration and depth information and achieves robust visual servoing in a variety of environments and targets without any parameter fine tuning. We present comprehensive evaluation of the approach and comparison with other deep learning architectures as well as classical visual servoing methods in visually realistic simulation environment. The presented model overcomes the brittleness of classical visual servoing based methods and achieves significantly higher generalization capability compared to the previous learning approaches.


Overview


Approach: train a policy for visual servoing

We compute Correspondence Map between Current View and Target View and input it into a DQN to predict Q-values for each action.

Visual Servoing Policy Architecture

We model DQN as a deep neural network. The layers can be separated into perception and action in terms of functionality.

ImageGoal Navigation: driving to a target object

Even though the policy is trained to move to a targeted view, our agent can also drive to a target object without additional training. Here is one example. We want our agent to go to the couch. We cut out the couch patch from the initial view and then compute the correspondence between the target object and the current observation.


Citation

                        @inproceedings{li2020learning,
                          title={Learning view and target invariant visual servoing for navigation},
                          author={Li, Yimeng and Ko{\v{s}}ecka, Jana},
                          booktitle={2020 IEEE International Conference on Robotics and Automation (ICRA)},
                          pages={658--664},
                          year={2020},
                          organization={IEEE}
                        }
                        


Acknowledgements

We thank members of the GMU Vision and Robotics Lab.
This webpage template was borrowed from some colorful folks.