Robotic manipulation in clutter with object-level semantic mapping
File(s)
Author(s)
Wada, Kentaro
Type
Thesis or dissertation
Abstract
To intelligently interact with environments and achieve useful tasks, robots need some level of understanding of a scene to plan sensible actions accordingly.
Semantic world models have been widely used in robotic manipulation, giving geometry and semantic information of objects that are vital to generating motions to complete tasks.
Using these models, typical traditional robotic systems generate motions with analysis-based motion planning, which often applies collision checks to generate a safe trajectory to execute.
It is primarily crucial for robots to build such world models autonomously, ideally with flexible and low-cost sensors such as on-board cameras, and generate motions with succeeding planning pipelines.
With recent progress on deep neural networks, increasing research has worked on end-to-end approaches to manipulation.
A typical end-to-end approach does not explicitly build world models, and instead generates motions from direct mapping from raw observation such as images, to introduce flexibility to handle novel objects and capability of manipulation beyond analysis-based motion planning.
However, this approach struggles to deal with long-horizon tasks that include several steps of grasping and placement, for which many action steps have to be inferred by learned models to generate trajectory.
This difficulty motivated us to use a hybrid approach of learned and traditional to take advantage of both, as previous studies on robotic manipulation showed long-horizon task achievements with explicit world models.
This thesis develops a robotic system that manipulates objects to change their states as requested with high-success, efficient, and safe maneuvers. In particular, we build an object-level semantic mapping pipeline that is able to build world models dealing with various objects in clutter, which is then integrated with various learned components to acquire manipulation skills. Our tight integration of explicit semantic mapping and learned motion generation enables the robot to accomplish long-horizon tasks with the extra capability of manipulation introduced by learning.
Semantic world models have been widely used in robotic manipulation, giving geometry and semantic information of objects that are vital to generating motions to complete tasks.
Using these models, typical traditional robotic systems generate motions with analysis-based motion planning, which often applies collision checks to generate a safe trajectory to execute.
It is primarily crucial for robots to build such world models autonomously, ideally with flexible and low-cost sensors such as on-board cameras, and generate motions with succeeding planning pipelines.
With recent progress on deep neural networks, increasing research has worked on end-to-end approaches to manipulation.
A typical end-to-end approach does not explicitly build world models, and instead generates motions from direct mapping from raw observation such as images, to introduce flexibility to handle novel objects and capability of manipulation beyond analysis-based motion planning.
However, this approach struggles to deal with long-horizon tasks that include several steps of grasping and placement, for which many action steps have to be inferred by learned models to generate trajectory.
This difficulty motivated us to use a hybrid approach of learned and traditional to take advantage of both, as previous studies on robotic manipulation showed long-horizon task achievements with explicit world models.
This thesis develops a robotic system that manipulates objects to change their states as requested with high-success, efficient, and safe maneuvers. In particular, we build an object-level semantic mapping pipeline that is able to build world models dealing with various objects in clutter, which is then integrated with various learned components to acquire manipulation skills. Our tight integration of explicit semantic mapping and learned motion generation enables the robot to accomplish long-horizon tasks with the extra capability of manipulation introduced by learning.
Version
Open Access
Date Issued
2022-02
Date Awarded
2022-06
Copyright Statement
Creative Commons Attribution NonCommercial NoDerivatives Licence
Advisor
Davison, Andrew J.
Sponsor
Dyson Technology Limited
Imperial College London
Publisher Department
Computing
Publisher Institution
Imperial College London
Qualification Level
Doctoral
Qualification Name
Doctor of Philosophy (PhD)