Doi: 10.1145/1538788.1538812
Apprenticeship Learning
for Helicopter Control
By Adam Coates, Pieter Abbeel, and Andrew Y. Ng
Abstract
Autonomous helicopter flight is widely regarded to be a
highly challenging control problem. As helicopters are highly
unstable and exhibit complicated dynamical behavior, it is
particularly difficult to design controllers that achieve high
performance over a broad flight regime.
While these aircraft are notoriously difficult to control,
there are expert human pilots who are nonetheless capable
of demonstrating a wide variety of maneuvers, including
aerobatic maneuvers at the edge of the helicopter’s performance envelope. In this paper, we present algorithms for
modeling and control that leverage these demonstrations
to build high-performance control systems for autonomous
helicopters. More specifically, we detail our experiences with
the Stanford Autonomous Helicopter, which is now capable
of extreme aerobatic flight meeting or exceeding the performance of our own expert pilot.
1. in TRoDuCTion
Autonomous helicopter flight represents a challenging control problem with high-dimensional, asymmetric, noisy, nonlinear, nonminimum phase dynamics. Helicopters are widely
regarded to be significantly harder to control than fixed-wing
aircraft. (See, e.g., Leishman, 18 Seddon. 31) At the same time,
helicopters provide unique capabilities, such as in-place hover
and low-speed flight, important for many applications. The
control of autonomous helicopters thus provides a challenging
and important test bed for learning and control algorithms.
There is a considerable body of research concerning control of autonomous (RC) helicopters in the typical “upright
flight regime.” This has allowed autonomous helicopters
to reliably perform many practical maneuvers, such as sustained hover, low-speed horizontal flight, and autonomous
landing. 9, 16, 17, 24, 28, 30
In contrast, autonomous flight achievements in other
flight regimes have been limited. Gavrilets et al. 14 performed
some of the first autonomous aerobatic maneuvers: a stall-turn, a split-S, and an axial roll. Ng et al. 23 achieved sustained
autonomous inverted hover. While these results significantly
expanded the potential capabilities of autonomous helicopters, it has remained difficult to design control systems
capable of performing arbitrary aerobatic maneuvers at a performance level comparable to human experts.
In this paper, we describe our line of autonomous helicopter research. Our work covers a broad approach to autonomous helicopter control based on “apprenticeship learning”
that achieves expert-level performance on a vast array of
maneuvers, including extreme aerobatics and autonomous
autorotation landings. 1, 2, 12, 23 (Refer footnote a.)
In apprenticeship learning, we assume that an expert is
available who is capable of performing the desired maneuvers. We then leverage these demonstrations to learn all of the
necessary components for our control system. In particular,
the demonstrations allow us to learn a model of the helicopter dynamics, as well as appropriate choices of target trajectories and reward parameters for input into a reinforcement
learning or optimal control algorithm.
The remainder of this paper is organized as follows: Section 2 briefly overviews related work in the robotics literature
that is similar in spirit to our approach. Section 3 describes
our basic modeling approach, where we develop a model of
the helicopter dynamics from data collected under human
control, and subsequently improve this model using data
from autonomous flights. Section 4 presents an apprentice-ship-based trajectory learning algorithm that learns idealized
trajectories of the maneuvers we wish to fly. This algorithm
also provides a mechanism for improving our model of the
helicopter dynamics along the desired trajectory. Section 5
describes our control algorithm, which is based on differential dynamic programming (DDP). 15 Section 6 describes our
helicopter platform and presents our experimental results.
2. ReLATeD WoRK
Although no prior works span our entire setting of apprenticeship learning for control, there are separate pieces of
work that relate to various components of our approach.
Atkeson and Schaal, 8 for instance, use multiple demonstrations to learn a model for a robot arm, and then find an
optimal controller in their simulator, initializing their optimal control algorithm with one of the demonstrations.
The work of Calinon et al. 11 considered learning trajectories
and constraints from demonstrations for robotic tasks. There,
however, they do not consider the system’s dynamics or provide a clear mechanism for the inclusion of prior knowledge,
which will be a key component of our approach as detailed in
Section 4. Our formulation will present a principled, joint optimization which takes into account the multiple demonstrations, as well as the (complex) system dynamics.
Among others, An et al. 6 and Abbeel et al. 5 have exploited
the idea of trajectory-specific model learning for control.
a
Autorotation is an emergency maneuver that allows a trained pilot to descend and land the helicopter without engine power.
A previous version of this paper, entitled “Learning for
Control from Multiple Demonstrations” was published
in Proceedings of the 26th International Conference of
Machine Learning, (ICML 2008), 144–151.