figure 3: our XCell Tempest autonomous helicopter.
learning algorithm includes bias terms, b *, for each of the pre-
dicted accelerations, and hence will learn a time-dependent
acceleration that is added to the crude base model. We also
include terms to model position drift in the pilot demonstrations, and incorporate our prior knowledge that flips and
rolls should remain roughly in place, and that maneuvers
like loops should be flown in a plane (i.e., they should look
flat when viewed from the top). 12
6. 2. Trajectory learning results
Figure 4(a) shows the horizontal and vertical position of the
helicopter during the two loops flown during the airshow
performed by our pilot. The colored lines show the expert
pilot’s demonstrations. The black dotted line shows the
inferred ideal path produced by our algorithm. The loops
are more rounded and more consistent in the inferred ideal
path. We did not incorporate any prior knowledge to this
effect. Figure 4(b) shows a top-down view of the same demonstrations and inferred trajectory. This view shows that the
algorithm successfully inferred a trajectory that lies in a vertical plane, while obeying the system dynamics, as a result of
the included prior knowledge.
Figure 4(c) shows one of the bias terms, namely the prediction errors made by our crude model for the z-axis acceleration of the helicopter for each of the demonstrations
(plotted as a function of time). Figure 4(d) shows the result
after alignment (in color) as well as the inferred acceleration
error (black dotted). We see that the bias measurements
allude to errors approximately in the −1G to −2G range for
the first 40s of the airshow (a period that involves high-G
maneuvering that is not predicted accurately by the “crude”
model). However, only the aligned biases precisely show the
magnitudes and locations of these errors along the trajectory. The alignment allows us to build our ideal trajectory
based upon a much more accurate model that is tailored to
match the dynamics observed in the demonstrations.
6. 3. flight results
After constructing the idealized trajectories and models
using our algorithms, we attempted to fly the trajectories
on the actual helicopter. As described in Section 5, we use
a receding-horizon DDP controller. Our trajectory learn-
ing algorithm provides us with desired state and control
trajectories, as well as an accurate, time-varying dynamics
model tailored to the trajectory. These are provided to our
DDP implementation along with quadratic reward weights
chosen previously using the method described in Section
5. 2. The quadratic reward function penalizes deviation from
the target trajectory, s*, as well as deviation from the desired
controls, u*, and the desired control velocities, u - u*.
t t+ 1 t
We compare the result of this procedure first with the
former state of the art in aerobatic helicopter flight, namely
the in-place rolls and flips of Abbeel. 2 That work used a single crude model, developed using the method of Section 3,
along with hand-specified target trajectories, and reward
weights tuned using the methodology in Section 5. 2.
Figure 5(a) shows the Y–Z positionj and the collective
(thrust) control inputs for the in-place rolls performed by
the controller in Abbeel2 and our controller using receding-horizon DDP and the outputs of our trajectory learning
algorithm. Our new controller achieves (i) better position
performance and (ii) lower overall collective control values
(which roughly represents the amount of energy being used
to fly the maneuver).
Similarly, Figure 5(b) shows the X–Z position and the collective control inputs for the in-place flips for both controllers. Like for the rolls, we see that our controller significantly
outperforms the previous approach, both in position accuracy and in control energy expended.
These are the position coordinates projected into a plane orthogonal to the
axis of rotation.
figure 4: Colored lines: demonstrations. Black dotted line: trajectory inferred by our algorithm. (see text for details.)
20 30 40
− 5 0 5 10 15
Z Acceleration error (m/s )
Z Acceleration error (m/s )