Search code examples
pythontensorflowkerasensemble-learningcleverhans

How to apply a Cleverhans attack when the final layer is not `softmax` (e.g. ensemble models)?


I am trying to attack an ensemble of Keras models following the method proposed in this paper. In section 5, they note that the attack is of the form: enter image description here

So, I moved on to create an ensemble of pretrained Keras MNIST models as follows:

def ensemble(models, model_input):

    outputs = [model(model_input) for model in models]
    y = Average()(outputs)

    model = Model(model_input, y, name='ensemble')

    return model

models = [...] # list of pretrained Keras MNIST models

model = ensemble(models, model_input)
model_wrapper = KerasModelWrapper(model)
attack_par = {'eps': 0.3, 'clip_min': 0., 'clip_max': 1.}
attack = FastGradientMethod(model_wrapper, sess=sess)

x = tf.placeholder(tf.float32, shape=(None, img_rows, img_cols,
                                      nchannels))
attack.generate(x, **attack_par) # ERROR!

At the final line, I get the following error:

----------------------------------------------------------
Exception                Traceback (most recent call last)
<ipython-input-23-1d2e22ceb2ed> in <module>
----> 1 attack.generate(x, **attack_par)

~/ri/safechecks/venv/lib/python3.6/site-packages/cleverhans/attacks/fast_gradient_method.py in generate(self, x, **kwargs)
     48     assert self.parse_params(**kwargs)
     49 
---> 50     labels, _nb_classes = self.get_or_guess_labels(x, kwargs)
     51 
     52     return fgm(

~/ri/safechecks/venv/lib/python3.6/site-packages/cleverhans/attacks/attack.py in get_or_guess_labels(self, x, kwargs)
    276       labels = kwargs['y_target']
    277     else:
--> 278       preds = self.model.get_probs(x)
    279       preds_max = reduce_max(preds, 1, keepdims=True)
    280       original_predictions = tf.to_float(tf.equal(preds, preds_max))

~/ri/safechecks/venv/lib/python3.6/site-packages/cleverhans/utils_keras.py in get_probs(self, x)
    188     :return: A symbolic representation of the probs
    189     """
--> 190     name = self._get_softmax_name()
    191 
    192     return self.get_layer(x, name)

~/ri/safechecks/venv/lib/python3.6/site-packages/cleverhans/utils_keras.py in _get_softmax_name(self)
    126         return layer.name
    127 
--> 128     raise Exception("No softmax layers found")
    129 
    130   def _get_abstract_layer_name(self):

Exception: No softmax layers found

It seems like it is a requirement that the final layer of the target model is a softmax layer. However, Fast Gradient Method technically doesn't need to have that as a requirement. Is this something that Cleverhans enforces for the ease of library implementation? Are there ways to get around this problem and use Cleverhans to attack models without the final softmax layer?


Solution

  • The reason why CleverHans requires one to pass logits to the attacks is for numerical stability (e.g., so we don't take logs of exponents).

    That said, attacking an ensemble is a legitimate use case. I can think of two options:

    • if all of your models have comparable logit distributions, you could average the logits and provide those to the attack object.

    • you could compute the adversary's loss on each of the N models within the ensemble, average all of these N adversarial losses, and then the attack would optimize this averaged loss.

    The second option would require modifying the existing CleverHans API but if you would like to make a PR to the GitHub repo, I would be happy to help review it.

    Hope this helps.