While carrying out their jobs, humanoids and legged robots should be able to go where humans go and endure large changes in the environment. The ideal locomotion controller for a legged robot must produce behaviors which are robust to unforeseen disturbances in ground height, slope, and ground friction. Engineering such a controller by hand requires an intensive-process of hand-tuning gains, refining robot models used for motion planning, and explicitly handling disturbances such as early or late contact, among other edge cases.
Reinforcement learning presents an alternative to this paradigm: instead of describing an explicit solution for a problem, it may be easier to describe the problem and learn the solution. Describing the problem of legged locomotion, however, is not an easy task. A common approach is using precise reference trajectories, which are effective at guiding learning, but only describe a small space of behaviors. At the other extreme, simply defining the problem as needing to move in some direction is too generic and leads to sub-optimal controllers. From the perspective of an end-user, the ideal control system for a legged robot can be commanded with both these abstract tasks and the specific ones. The control hierarchy takes care of the details and uses learning to find the most optimal control policies at each level in the hierarchy.
In this project, we look at the progression of research on this ideal control hierarchy for the blind (no perception) bipedal robot Cassie, starting with the integration of a model-based planner with a deep reinforcement learning locomotion controller. Next, we replace the planner with a principled framework for commanding and computing cost functions for all bipedal gaits. Using this approach as a solid foundation for low-level locomotion skills, we explore how wrapping the locomotion controller with another reinforcement learning process can be used to find optimal control hierarchies for generic tasks like moving to a waypoint, stepping in a particular sequence, or reaching a goal state. This method of wrapping learning processes with increasingly more abstract objectives is modular, and we repeat the process again to demonstrate a control hierarchy for parsing the commands of an action grammar.
The methods used in this project for both low-level locomotion and higher-level skills are extremely promising for producing the intelligent and versatile robot control systems of the future.