python tensorflow keras model cleverhans

Attacking Tensorflow model with Cleverhans' CarliniWagnerL2 resulting in NotImplementedError

I'm trying to get familiar with tensorflow and cleverhans. But it seems that I get functionalities mixed up.

I set up a simple model with tensorflow, train it and then want to craft an adversarial image with cleverhans' CarliniWagnerL2-attack. I read through the code of tensorflows and cleverhans' documentation and tried to understand whats happening but I just don't get which function from which library I have to use.

This is my simplified example code. As far as I understood I have to turn a callable into a valid function using the CallableModelWrapper. Is that right? Or is my model no callable? Is it actually possible to use tensorflow to train a model and then attack it with cleverhans? The error occurs when I try to generate the adversarial image.

# TensorFlow and tf.keras
import tensorflow as tf

# Cleverhans
import cleverhans as ch
from cleverhans import attacks
from cleverhans import model

# Others
import numpy as np

sess = tf.Session()

# load data set
mnist = tf.keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

class_names = ['0', '1', '2', '3', '4',
               '5', '6', '7', '8', '9']

train_images = train_images / 255.0
test_images = test_images / 255.0

#set up model
model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128, activation=tf.nn.relu),
    tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])

model.compile(optimizer='SGD',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# train
model.fit(train_images, train_labels, epochs=3)

# wrap 
wrap = ch.model.CallableModelWrapper(model, 'probs')

cw = ch.attacks.CarliniWagnerL2(wrap, sess=sess)

#set params and targeted image
cw_params = {'batch_size': 1,
             'confidence': 10,
             'learning_rate': 0.1,
             'binary_search_steps': 5,
             'max_iterations': 1000,
             'abort_early': True,
             'initial_const': 0.01,
             'clip_min': 0,
             'clip_max': 1}

image = np.array([test_images[0]])

# and here i get the error!!!
adv_cw = cw.generate_np(image, **cw_params)

I actually want to get the adversarial image but whatever I try it seems that I use a mix of the two libraries and they don't go well together. I get:

NotImplementedError: must implement get_logits or must define a logits output in fprop

Can anyone help?

Basically I just want to understand what models I can use for cleverhans.attacks ! :)

Thanks in advance.

Rolle

Edit

This is my Traceback:

Traceback (most recent call last):
  File "/usr/lib/python3.6/code.py", line 91, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "/home/<me>/.local/share/JetBrains/Toolbox/apps/PyCharm-P/ch-0/191.6605.12/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/home/<me>/.local/share/JetBrains/Toolbox/apps/PyCharm-P/ch-0/191.6605.12/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/home/<path_to_project>/tensorflow/untitled/minexample.py", line 57, in <module>
    adv_cw = cw.generate_np(image, **cw_params)
  File "/home/<me>/.local/lib/python3.6/site-packages/cleverhans/attacks/__init__.py", line 189, in generate_np
    self.construct_graph(fixed, feedable, x_val, hash_key)
  File "/home/<me>/.local/lib/python3.6/site-packages/cleverhans/attacks/__init__.py", line 161, in construct_graph
    x_adv = self.generate(x, **new_kwargs)
  File "/home/<me>/.local/lib/python3.6/site-packages/cleverhans/attacks/__init__.py", line 1196, in generate
    x.get_shape().as_list()[1:])
  File "/home/<me>/.local/lib/python3.6/site-packages/cleverhans/attacks_tf.py", line 628, in __init__
    self.output = model.get_logits(self.newimg)
  File "/home/<me/.local/lib/python3.6/site-packages/cleverhans/model.py", line 70, in get_logits
    " output in `fprop`")
NotImplementedError: <class 'cleverhans.model.CallableModelWrapper'>must implement `get_logits` or must define a logits output in `fprop`

I replaced my internal directory structure by path_to_project or me respectively.

Solution

The code snippet you shared defines and trains the model using Keras so it would be easier to use the specific KerasModelWrapper we have for Keras models. You can find a tutorial for doing that here.