I am trying to attack an ensemble of Keras models following the method proposed in this paper. In section 5, they note that the attack is of the form:
So, I moved on to create an ensemble of pretrained Keras MNIST models as follows:
def ensemble(models, model_input):
outputs = [model(model_input) for model in models]
y = Average()(outputs)
model = Model(model_input, y, name='ensemble')
return model
models = [...] # list of pretrained Keras MNIST models
model = ensemble(models, model_input)
model_wrapper = KerasModelWrapper(model)
attack_par = {'eps': 0.3, 'clip_min': 0., 'clip_max': 1.}
attack = FastGradientMethod(model_wrapper, sess=sess)
x = tf.placeholder(tf.float32, shape=(None, img_rows, img_cols,
nchannels))
attack.generate(x, **attack_par) # ERROR!
At the final line, I get the following error:
----------------------------------------------------------
Exception Traceback (most recent call last)
<ipython-input-23-1d2e22ceb2ed> in <module>
----> 1 attack.generate(x, **attack_par)
~/ri/safechecks/venv/lib/python3.6/site-packages/cleverhans/attacks/fast_gradient_method.py in generate(self, x, **kwargs)
48 assert self.parse_params(**kwargs)
49
---> 50 labels, _nb_classes = self.get_or_guess_labels(x, kwargs)
51
52 return fgm(
~/ri/safechecks/venv/lib/python3.6/site-packages/cleverhans/attacks/attack.py in get_or_guess_labels(self, x, kwargs)
276 labels = kwargs['y_target']
277 else:
--> 278 preds = self.model.get_probs(x)
279 preds_max = reduce_max(preds, 1, keepdims=True)
280 original_predictions = tf.to_float(tf.equal(preds, preds_max))
~/ri/safechecks/venv/lib/python3.6/site-packages/cleverhans/utils_keras.py in get_probs(self, x)
188 :return: A symbolic representation of the probs
189 """
--> 190 name = self._get_softmax_name()
191
192 return self.get_layer(x, name)
~/ri/safechecks/venv/lib/python3.6/site-packages/cleverhans/utils_keras.py in _get_softmax_name(self)
126 return layer.name
127
--> 128 raise Exception("No softmax layers found")
129
130 def _get_abstract_layer_name(self):
Exception: No softmax layers found
It seems like it is a requirement that the final layer of the target model is a softmax layer. However, Fast Gradient Method technically doesn't need to have that as a requirement. Is this something that Cleverhans enforces for the ease of library implementation? Are there ways to get around this problem and use Cleverhans to attack models without the final softmax layer?
The reason why CleverHans requires one to pass logits to the attacks is for numerical stability (e.g., so we don't take logs of exponents).
That said, attacking an ensemble is a legitimate use case. I can think of two options:
if all of your models have comparable logit distributions, you could average the logits and provide those to the attack object.
you could compute the adversary's loss on each of the N
models within the ensemble, average all of these N
adversarial losses, and then the attack would optimize this averaged loss.
The second option would require modifying the existing CleverHans API but if you would like to make a PR to the GitHub repo, I would be happy to help review it.
Hope this helps.