table 1. average per-sample log-loss (negative log likelihood) for
each venue distribution models. the “Wins” column shows the number of stock-venue pairs where a given model beats the other four on
the test data.
model
nonparametric
Zb + uniform
Zb + Power law
Zb + Poisson
Zb + exponential
train Loss
0.454
0.499
0.467
0.576
0.883
test Loss
0.872
0.508
0.484
0.661
0.953
Wins
3
12
28
0
5
distribution when b > 0. Nonparametric models trained
with Kaplan–Meier are best on the training data but overfit badly due to their complexity relative to the sparse
data, while the other parametric forms cannot accommodate the heavy tails of the data. This is summarized
in Table 1. Based on this comparison, for our dark pool
study we investigate a variant of our main algorithm, in
which the estimate–allocate loop has an estimation step
using maximum likelihood estimation within the ZB +
Power Law model, and allocations are done greedily on
these same models.
In terms of the estimated ZB + Power Law parameters
themselves, we note that for all 48 stock–pool pairs the
Zero Bin parameter accounted for most of the distribution (between a fraction 0.67 and 0.96), which is not surprising considering the aforementioned preponderance
of entirely unfilled orders in the data. The vast majority
of the 48 exponents b fell between b = 0.25 and b = 1. 3—so
rather long tails indeed—but it is noteworthy that for one
of the four dark pools, 7 of the 12 estimated exponents
were actually negative, yielding a model that predicts
higher probabilities for larger volumes. This is likely an
artifact of our size- and time-limited data set, but is not
entirely unrealistic and results in some interesting behavior in the simulations.
5. 3. Data-based simulation results
As in any control problem, the dark pool data in our possession is unfortunately insufficient to evaluate and compare
different allocation algorithms. This is because of the aforementioned fact that the volumes submitted to each venue
were fixed by the specific policy that generated the data,
and we cannot explore alternative choices—if our algorithm chooses to submit 1000 shares to some venue, but in
the data only 500 shares were submitted, we simply cannot
infer the outcome of our desired submission.
We thus instead use the raw data to derive a simulator
with which we can evaluate different approaches. In light of
the modeling results of Section 5. 2, the simulator for stock S
was constructed as follows. For each dark pool i, we used all
of the data for i and stock S to estimate the maximum likeli-
hood Zero Bin + Power Law distribution. (Note that there is
no need for a training-test split here, as we have already sep-
arately validated the choice of distributional model.) This
results in a set of four venue distribution models Pi that form
the simulator for stock S. This simulator accepts allocation
vectors (v1, v2, v3, v4) indicating how many shares some algo-
rithm wishes to submit to each venue, draws a “true liquid-
ity” value si from Pi for each i, and returns the vector (r1, r2, r3,
r4), where ri = min(vi, si) is the possibly censored number of
shares filled in venue i.
stock:AIG stock:NRG
Cens−Exp
Bandits
Cens−Exp
figure 4. sample learning curves. for the stock aiG (left panel),
the naive bandits algorithm (labeled blue curve) beats uniform
allocation (dashed horizontal line) but appears to asymptote short
of ideal (solid horizontal line). for the stock nRG (right panel),
the bandits algorithm actually deteriorates with more episodes,
underperforming both the uniform and ideal allocations. for both
stocks (and the other 10 in our data set), our algorithm (labeled red
curve) performs nearly optimally.
Fraction executed
0
0.12
0.14
0.16
0.18
Bandits
1000
Episodes
2000
0.12
Fraction executed
0.11
0.1
0
0.08
0.09
1000
Episodes