in Table 2 in all the experiments unless otherwise specified.
Figure 7 shows some difference-inducing inputs generated
by DeepXplore for MNIST, ImageNet, and Driving dataset
along with the corresponding erroneous behaviors. Table 3
(Drebin) and Table 4 (Contagio/Virus Total) show two sample
difference-inducing inputs generated by DeepXplore that
caused erroneous behaviors in the tested DNNs. We highlight the differences between the seed input features and
the features modified by DeepXplore. Note that we only list
the top three modified features due to space limitations.
6. 2. Benefits of neuron coverage
In this subsection, we evaluate how effective neuron coverage is in measuring the comprehensiveness of DNN testing.
and ( 3) occlusion by multiple tiny black rectangles for simulating effects of dirt on camera lens.
Other constraints (Drebin and Contagio/VirusTotal).
For Drebin dataset, DeepXplore enforces a constraint that
only allows modifying features related to the Android manifest file and thus ensures that the application code is unaffected. Moreover, DeepXplore only allows adding features
(changing from zero to one) but does not allow deleting features (changing from one to zero) from the manifest files to
ensure that no application functionality is changed due to
insufficient permissions. Thus, after computing the gradient, DeepXplore only modifies the manifest features whose
corresponding gradients are greater than zero. For Contagio/
Virus Total dataset, we follow the restrictions on each feature
as described by Šrndic and Laskkov.
6. 1. Summary
Table 2 summarizes the numbers of erroneous behaviors
found by DeepXplore for each tested DNN while using 2000
randomly selected seed inputs from the corresponding test
sets. Note that as the testing set has a similar number of
samples for each class, these randomly-chosen 2000 samples also follow that distribution. The hyperparameters for
these experiments, as shown in Table 2, are empirically chosen to maximize both the rate of finding difference-inducing
inputs as well as the neuron coverage.
For the experimental results shown in Figure 7, we apply
three domain-specific constraints (lighting effects, occlusion
by a single rectangle, and occlusion by multiple rectangles)
as described in Section 5. 2. For all other experiments involving vision-related tasks, we only use the lighting effects as the
domain-specific constraints. For all malware-related experiments, we apply all the relevant domain-specific constraints
described in Section 5. 2. We use the hyperparameters listed
Table 1. Details of the DNNs and datasets used to evaluate DeepXplore.
description DNN description DNN name of neurons Architecture
Acc. Our Acc.
LeNet variations MNI_C1
LeNet- 1, LeCun et al. [ 8]
LeNet- 4, LeCun et al. [ 8]
LeNet- 5, LeCun et al. [ 8]
VGG- 16, Simonyan et al. [ 12]
VGG- 19, Simonyan et al. [ 12]
ResNet50, He et al. [ 5]
Driving Driving video
Dave-orig [ 1]
PDFs PDF malware
<200, 200, 200>+
<200, 200, 200, 200>+
Drebin Android apps Android app
<200, 200>+, Grosse et al. [ 4]
< 50, 50>+, Grosse et al. [ 4]
<200, 10>+, Grosse et al. [ 4]
** Top- 5 test accuracy; we exactly match the reported performance as we use the pretrained networks.
We report 1-MSE (mean squared error) as the accuracy because steering angle is a continuous value.
+ <x,y,…> denotes three hidden layers with x neurons in first layer, y neurons in second layer, etc.
− Accuracy using SVM as reported by Šrndic et al. [ 14].
Table 2. Number of difference-inducing inputs found by DeepXplore
for each tested DNN obtained by randomly selecting 2000 seeds
from the corresponding test set for each run.
found λ1 λ2 s t
1 0.1 10 0 1073
1 0.1 10 0 1969
1 0.1 10 0 1720
2 0.1 0.1 0 1103
1 0.5 N/A 0 2000