Doi: 10.1145/1538788.1538812
By Adam Coates, Pieter Abbeel, and Andrew Y. Ng
Autonomous helicopter flight is widely regarded to be a highly challenging control problem. As helicopters are highly unstable and exhibit complicated dynamical behavior, it is particularly difficult to design controllers that achieve high performance over a broad flight regime.
While these aircraft are notoriously difficult to control, there are expert human pilots who are nonetheless capable of demonstrating a wide variety of maneuvers, including aerobatic maneuvers at the edge of the helicopter’s performance envelope. In this paper, we present algorithms for modeling and control that leverage these demonstrations to build high-performance control systems for autonomous helicopters. More specifically, we detail our experiences with the Stanford Autonomous Helicopter, which is now capable of extreme aerobatic flight meeting or exceeding the performance of our own expert pilot.
Autonomous helicopter flight represents a challenging control problem with high-dimensional, asymmetric, noisy, nonlinear, nonminimum phase dynamics. Helicopters are widely regarded to be significantly harder to control than fixed-wing aircraft. (See, e.g., Leishman, 18 Seddon. 31) At the same time, helicopters provide unique capabilities, such as in-place hover and low-speed flight, important for many applications. The control of autonomous helicopters thus provides a challenging and important test bed for learning and control algorithms.
There is a considerable body of research concerning control of autonomous (RC) helicopters in the typical “upright flight regime.” This has allowed autonomous helicopters to reliably perform many practical maneuvers, such as sustained hover, low-speed horizontal flight, and autonomous landing. 9, 16, 17, 24, 28, 30
In contrast, autonomous flight achievements in other flight regimes have been limited. Gavrilets et al. 14 performed some of the first autonomous aerobatic maneuvers: a stall-turn, a split-S, and an axial roll. Ng et al. 23 achieved sustained autonomous inverted hover. While these results significantly expanded the potential capabilities of autonomous helicopters, it has remained difficult to design control systems capable of performing arbitrary aerobatic maneuvers at a performance level comparable to human experts.
In this paper, we describe our line of autonomous helicopter research. Our work covers a broad approach to autonomous helicopter control based on “apprenticeship learning” that achieves expert-level performance on a vast array of maneuvers, including extreme aerobatics and autonomous autorotation landings. 1, 2, 12, 23 (Refer footnote a.)
In apprenticeship learning, we assume that an expert is available who is capable of performing the desired maneuvers. We then leverage these demonstrations to learn all of the necessary components for our control system. In particular, the demonstrations allow us to learn a model of the helicopter dynamics, as well as appropriate choices of target trajectories and reward parameters for input into a reinforcement learning or optimal control algorithm.
The remainder of this paper is organized as follows: Section 2 briefly overviews related work in the robotics literature that is similar in spirit to our approach. Section 3 describes our basic modeling approach, where we develop a model of the helicopter dynamics from data collected under human control, and subsequently improve this model using data from autonomous flights. Section 4 presents an apprentice-ship-based trajectory learning algorithm that learns idealized trajectories of the maneuvers we wish to fly. This algorithm also provides a mechanism for improving our model of the helicopter dynamics along the desired trajectory. Section 5 describes our control algorithm, which is based on differential dynamic programming (DDP). 15 Section 6 describes our helicopter platform and presents our experimental results.
Although no prior works span our entire setting of apprenticeship learning for control, there are separate pieces of work that relate to various components of our approach.
Atkeson and Schaal, 8 for instance, use multiple demonstrations to learn a model for a robot arm, and then find an optimal controller in their simulator, initializing their optimal control algorithm with one of the demonstrations.
The work of Calinon et al. 11 considered learning trajectories and constraints from demonstrations for robotic tasks. There, however, they do not consider the system’s dynamics or provide a clear mechanism for the inclusion of prior knowledge, which will be a key component of our approach as detailed in Section 4. Our formulation will present a principled, joint optimization which takes into account the multiple demonstrations, as well as the (complex) system dynamics.
Among others, An et al. 6 and Abbeel et al. 5 have exploited the idea of trajectory-specific model learning for control.
a
Autorotation is an emergency maneuver that allows a trained pilot to descend and land the helicopter without engine power.
A previous version of this paper, entitled “Learning for Control from Multiple Demonstrations” was published in Proceedings of the 26th International Conference of Machine Learning, (ICML 2008), 144–151.
References:
Archives