figure 5: flight results. (a, b) solid black: results with trajectory learning algorithm. Dashed red: results with hand-coded trajectory from
Abbeel. 2 (c) Dotted black: autonomous tic-toc. solid colored: expert demonstrations.
10
Altitude (m)
5
0
10
Altitude (m)
5
0
− 15 − 10
− 5 0 5 10 15 20
North position (m)
− 20 − 15 − 10 − 5 0 5 10 15
East position (m)
1
Collective input
0.5
0
−0.5
− 1
0 5 10
14
12
10
8
Altitude (m)
6
4
2
0
− 2
− 4
− 6
15 20 25 30 35
Time (s)
(a)
1
Collective input
0.5
0
−0.5
− 1
0 5 10
15 20 25 30 35
Time (s)
(b)
− 10
− 5 0
North position (m)
(c)
5
Besides flips and rolls, we also performed autonomous
“tic tocs”—widely considered to be an even more challenging aerobatic maneuver. During the (tail-down) tic-toc
maneuver the helicopter pitches quickly backward and
forward in place with the tail pointed toward the ground
(resembling an inverted clock pendulum). The complex
relationship between pitch angle, horizontal motion, vertical motion, and thrust makes it extremely difficult to
create a feasible tic-toc trajectory by hand. Our attempts
to use such a hand-coded trajectory, following the previous approach in Abbeel, 2 failed repeatedly. By contrast,
the trajectory learning algorithm readily yields an excellent feasible trajectory that was successfully flown on the
first attempt. Figure 5(c) shows the expert trajectories (in
color), and the autonomously flown tic-toc (black dotted). Our controller significantly outperforms the expert’s
demonstrations.
We also applied our algorithm to successfully fly a complete aerobatic airshow, as described in Section 6. 1.
The trajectory-specific models typically capture the
dynamics well enough to fly all the aforementioned maneuvers reliably. Since our computer controller flies the trajectory very consistently, however, this allows us to repeatedly
acquire data from the same vicinity of the target trajectory
on the real helicopter. Thus, we can incorporate this flight
data into our model, allowing us to improve flight accuracy
even further. For example, during the first autonomous
airshow our controller achieves an RMS position error
of 3.29m, and this procedure improved performance to
1. 75 m RMS position error.
Videos of all our flights are available at: http://heli.
stanford.edu
7. ConCLusion
We have presented learning algorithms that take advantage of expert demonstrations to successfully fly autonomous helicopters at the level of an expert human pilot.
In particular, we have shown how to (i) build a rough
global model from demonstration data, (ii) approximately infer the expert’s ideal desired trajectory, (iii) learn
accurate, trajectory-specific local models suitable for high-performance control, and (iv) build control systems using
the outputs of our trajectory learning algorithm. Our experiments demonstrated that this design pipeline enables
our controllers to fly extreme aerobatic maneuvers. Our
results have shown that our system not only significantly
outperforms the previous state of the art, but even outperforms our own expert pilot on a wide variety of difficult
maneuvers.
Acknowledgments
We thank Garett Oku for piloting and building our helicopters. Adam Coates is supported by a Stanford Graduate
Fellowship. This work was also supported in part by the
DARPA Learning Locomotion program under contract
number FA8650-05-C-7261.
References
1. Abbeel, P., Coates, A., hunter, T., Ng,
A. Y. Autonomous autorotation of an
RC helicopter. ISER 11 (2008).
2. Abbeel, P., Coates, A., Quigley, M., Ng,
A. Y. An application of reinforcement
learning to aerobatic helicopter flight.
NIPS 19 (2007), 1–8.
3. Abbeel, P., Ganapathi, V., Ng, A.
Learning vehicular dynamics, with
application to modeling helicopters.
NIPS 18 (2006), 1–8.
4. Abbeel, P., Ng, A. Y. Apprenticeship
learning via inverse reinforcement
learning. In Proceedings of ICML
( 2004).
5. Abbeel, P., Quigley, M., Ng, A. Y. Using
inaccurate models in reinforcement
learning. In Proceedings of ICML
(2006), ACM, NY, 1–8.
6. An, C.h., Atkeson, C.G., hollerbach, J.M.
Model-Based Control of a Robot
Manipulator. MI T Press,
1988.
7. Anderson, B., Moore, J. Optimal
Control: Linear Quadratic Methods.
Prentice-hall, 1989.
8. Atkeson, C., Schaal, S. Robot learning
from demonstration. In Proceedings
of ICML (1997).
9. Bagnell, J., Schneider, J.
Autonomous helicopter control using
reinforcement learning policy search
methods. In IEEE International
Conference on Robotics and
Automation (2001).
10. Boutilier, C., Friedman, N.,
Goldszmidt, M., koller, D. Context-specific independence in Bayesian
networks. In Proceedings of UAI
(1996).
11. Calinon, S., Guenter, F., Billard, A.
On learning, representing and
generalizing a task in a humanoid
robot. In IEEE Transactions on
Systems, Man and Cybernetics, Part
B, volume 37, 2007.
12. Coates, A., Abbeel, P., Ng, A. Y.
Learning for control from multiple
demonstrations. In Proceedings of
ICML (2008), 144–151.
13. Dempster, A.P., Laird, N. M., Rubin, D.B.
Maximum likelihood from incomplete
data via the EM algorithm. J. Roy.
Stat. Soc. (1977).
14. Gavrilets, V., Martinos, I., Mettler, B.,
Feron, E. Control logic for automated
aerobatic flight of miniature
helicopter. In AIAA Guidance,