solo notes culminating in a point of coincidence with the
orchestra. As each solo note is detected we refine our estimate of the desired point of coincidence, thus gradually
“honing in” on this point of arrival. It is worth noting that
very little harm is done when Listen fails to detect a solo
note. We simply predict the pending orchestra note conditioning on the variables we have observed.
The web page given before contains a video demonstrating
this process. The video shows the estimated solo times from
our score follower appearing as green marks on a spectrogram. Predictions of our accompaniment system are shown
as analogous red marks. One can see the pending orchestra
time “jiggling” as new solo notes are estimated, until finally
the currently predicted time passes. In the video, one can see
occasional solo notes that are never marked with green lines.
These are notes for which the posterior onset time was not sufficiently peaked to merit a note detection. This happens most
often with repeated pitches, for which our data model is less
informative, and notes following longer notes, where our prior
model is less opinionated. We simply treat such notes as unobserved and base our predictions only on the observed events.
The role of Predict is to “schedule” accompaniment notes,
but what does this really mean in practice? Recall that our
program plays audio by phase-vocoding (time-stretching) an
orchestra-only recording. A time-frequency representation
of such an audio file for the first movement of the Dvor ˆák
Cello concerto is shown in Figure 4. If you know the piece,
you will likely be able to follow this spectrogram. In prepar-
ing this audio for our accompaniment system, we perform
an off-line score alignment to determine where the various
orchestra notes occur, as marked with vertical lines in the
figure. Scheduling a note simply means that we change the
phase-vocoder’s play rate so that it arrives at the appropri-
ate audio file position (vertical line) at the scheduled time.
Thus the play rate is continually modified as the performance
evolves. This is our only “control” the orchestra performance.
5. MusicaL eXPRession anD Machine LeaRninG
Our system learns its musicality through “osmosis.” If the
soloist plays in a musical way, and the orchestra manages
to closely follow the soloist, then we hope the orchestra will
inherit this musicality. This manner of learning by imitation works well in the concerto setting, since the division
of authority between the players is rather extreme, mostly
granting the “right of way” to the soloist.
In contrast, the pure following approach is less reasonable
when the accompaniment needs a sense of musicality that
acts independently, or perhaps even in opposition, to what
figure 4. a “spectrogram” of the opening of the first movement of the Dvoˆrák cello concerto. the horizontal axis of the figure represents
time while the vertical axis represents frequency. the vertical lines show the note times for the orchestra.