Learning Task Styles and Structure from Direct Demonstrations

  • A critical challenge in robot learning from demonstration is the ability to map the behavior of the trainer onto a robot’s existing repertoire of basic/primitive capabilities. The robot must process a continuous stream of data coming from its sensors and cast this information onto its knowledge and control repertoire. In most cases, this consists of segmenting the data stream into meaningful units, and then mapping them into appropriate skills or tasks. The difficulty of the policy transfer problem is increased due to a current divide between the complexity of robot control architectures and their ability to support automatic construction of controllers through learning. In part, this problem is due to the fact that the observed behavior of the teacher may consist of a combination (or superposition) of the robot’s individual primitives. The problem becomes more complex when the task involves temporal sequences of goals.
  • We developed an autonomous control architecture that allows for learning of hierarchical task representations, in which: 1) every goal is achieved through a linear superposition (or fusion) of robot primitives and 2) sequencing across goals is achieved through arbitration. We treat learning of the appropriate superposition as a state estimation problem over the space of possible linear fusion weights, inferred through a particle filter. The contributions of the proposed control architecture are that it enables: 1) the use of both command arbitration and fusion within a single control representation and 2) automated construction of such representations from demonstration. Historically, these two main action selection mechanisms have been mostly employed separately in robot control, thus limiting the range of tasks that robots can execute. By recognizing the ability of arbitration to encode temporal sequences and the ability of fusion to combine concurrently running behaviors, we merge the strengths and features of both within a unique task representation.

Learning a left wall following behavior

Learning a circling behavior

Learning a sequence of fused behaviors

Learning a composition of fused behaviors

  • Design and Evaluation of Methods for Robot Learning by Demonstration, National Science Foundation, Early Career Development Award (CAREER), PI, Amount: $410,000, January 15, 2006 - January 14, 2011.