Search code examples

How to apply a Cleverhans attack when the final layer is not `softmax` (e.g. ensemble models)?

I am trying to attack an ensemble of Keras models following the method proposed in this paper. In section 5, they note that the attack is of the form: enter image description here

So, I moved on to create an ensemble of pretrained Keras MNIST models as follows:

def ensemble(models, model_input):

    outputs = [model(model_input) for model in models]
    y = Average()(outputs)

    model = Model(model_input, y, name='ensemble')

    return model

models = [...] # list of pretrained Keras MNIST models

model = ensemble(models, model_input)
model_wrapper = KerasModelWrapper(model)
attack_par = {'eps': 0.3, 'clip_min': 0., 'clip_max': 1.}
attack = FastGradientMethod(model_wrapper, sess=sess)

x = tf.placeholder(tf.float32, shape=(None, img_rows, img_cols,
attack.generate(x, **attack_par) # ERROR!

At the final line, I get the following error:

Exception                Traceback (most recent call last)
<ipython-input-23-1d2e22ceb2ed> in <module>
----> 1 attack.generate(x, **attack_par)

~/ri/safechecks/venv/lib/python3.6/site-packages/cleverhans/attacks/ in generate(self, x, **kwargs)
     48     assert self.parse_params(**kwargs)
---> 50     labels, _nb_classes = self.get_or_guess_labels(x, kwargs)
     52     return fgm(

~/ri/safechecks/venv/lib/python3.6/site-packages/cleverhans/attacks/ in get_or_guess_labels(self, x, kwargs)
    276       labels = kwargs['y_target']
    277     else:
--> 278       preds = self.model.get_probs(x)
    279       preds_max = reduce_max(preds, 1, keepdims=True)
    280       original_predictions = tf.to_float(tf.equal(preds, preds_max))

~/ri/safechecks/venv/lib/python3.6/site-packages/cleverhans/ in get_probs(self, x)
    188     :return: A symbolic representation of the probs
    189     """
--> 190     name = self._get_softmax_name()
    192     return self.get_layer(x, name)

~/ri/safechecks/venv/lib/python3.6/site-packages/cleverhans/ in _get_softmax_name(self)
    126         return
--> 128     raise Exception("No softmax layers found")
    130   def _get_abstract_layer_name(self):

Exception: No softmax layers found

It seems like it is a requirement that the final layer of the target model is a softmax layer. However, Fast Gradient Method technically doesn't need to have that as a requirement. Is this something that Cleverhans enforces for the ease of library implementation? Are there ways to get around this problem and use Cleverhans to attack models without the final softmax layer?


  • The reason why CleverHans requires one to pass logits to the attacks is for numerical stability (e.g., so we don't take logs of exponents).

    That said, attacking an ensemble is a legitimate use case. I can think of two options:

    • if all of your models have comparable logit distributions, you could average the logits and provide those to the attack object.

    • you could compute the adversary's loss on each of the N models within the ensemble, average all of these N adversarial losses, and then the attack would optimize this averaged loss.

    The second option would require modifying the existing CleverHans API but if you would like to make a PR to the GitHub repo, I would be happy to help review it.

    Hope this helps.