Get gradient value necessary to break an image

I've been experimenting with adversarial images and I read up on the fast gradient sign method from the following link

The instructions explain that the necessary gradient can be calculated using backpropagation... enter image description here

I've been successful at generating adversarial images but I have failed at attempting to extract the gradient necessary to create an adversarial image. I will demonstrate what I mean.

Let us assume that I have already trained my algorithm using logistic regression. I restore the model and I extract the number I wish to change into a adversarial image. In this case it is the number 2...

# construct model
logits = tf.matmul(x, W) + b
pred = tf.nn.softmax(logits)
# assign the images of number 2 to the variable, labels_of_2))
# setup softmax

# placeholder for target label
fake_label = tf.placeholder(tf.int32, shape=[1])
# setup the fake loss
fake_loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,labels=fake_label)

# minimize fake loss using gradient descent,
# calculating the derivatives of the weight of the fake image will give the direction of weights necessary to change the prediction
adversarial_step = tf.train.GradientDescentOptimizer(learning_rate=FLAGS.learning_rate).minimize(fake_loss, var_list=[x])

# continue calculating the derivative until the prediction changes for all 10 images
for i in range(FLAGS.training_epochs):
    # fake label tells the training algorithm to use the weights calculated for number 6, feed_dict={fake_label:np.array([6])})

This is my approach, and it works perfectly. It takes my image of number 2 and changes it only slightly so that when I run the following...

x_in = np.expand_dims(x[0], axis=0)
classification =, 1))

it will predict the number 2 as a number 6.

The issue is, I need to extract the gradient necessary to trick the neural network into thinking number 2 is 6. I need to use this gradient to create the nematode mentioned above.

I am not sure how can I extract the gradient value. I tried looking at tf.gradients but I was unable to figure out how to produce an adversarial image using this function. I implemented the following after the fake_loss variable above...

tf.gradients(fake_loss, x)

for i in range(FLAGS.training_epochs):
    # calculate gradient with weight of number 6
    gradient_value =, feed_dict={fake_label:np.array([6])})
    # update the image of number 2
    gradient_update = x+0.007*gradient_value[0], gradient_update))

Unfortunately the prediction did not change in the way I wanted, and moreover this logic resulted in a rather blurry image.

I would appreciate an explanation as to what I need to do in order calculate and extract the gradient that will trick the neural network, so that if I were to take this gradient and apply it to my image as a nematode, it will result in a different prediction.


  • Why not let the Tensorflow optimizer add the gradients to your image? You can still evaluate the nematode to get the resulting gradients that were added.

    I created a bit of sample code to demonstrate this with a panda image. It uses the VGG16 neural network to transform your own panda image into a "goldfish" image. Every 100 iterations it saves the image as PDF so you can print it losslessly to check if your image is still a goldfish.

    import tensorflow as tf
    import numpy as np
    import matplotlib.pyplot as plt
    import IPython.display as ipyd
    from libs import vgg16 # Download here!
    pandaimage = plt.imread('panda.jpg')
    pandaimage = vgg16.preprocess(pandaimage)
    img_4d = np.array([pandaimage])
    g = tf.get_default_graph()
    input_placeholder = tf.Variable(img_4d,trainable=False)
    to_add_image = tf.Variable(tf.random_normal([224,224,3], mean=0.0, stddev=0.1, dtype=tf.float32))
    combined_images_not_clamped = input_placeholder+to_add_image
    filledmax = tf.fill(tf.shape(combined_images_not_clamped), 1.0)
    filledmin = tf.fill(tf.shape(combined_images_not_clamped), 0.0)
    greater_than_one = tf.greater(combined_images_not_clamped, filledmax)
    combined_images_with_max = tf.where(greater_than_one, filledmax, combined_images_not_clamped)
    lower_than_zero =tf.less(combined_images_with_max, filledmin)
    combined_images = tf.where(lower_than_zero, filledmin, combined_images_with_max)
    net = vgg16.get_vgg_model()
    tf.import_graph_def(net['graph_def'], name='vgg')
    names = [ for op in g.get_operations()]
    style_layer = 'prob:0'
    the_prediction = tf.import_graph_def(
        input_map={'images:0': combined_images},return_elements=[style_layer])
    goldfish_expected_np = np.zeros(1000)
    goldfish_expected_tf = tf.Variable(goldfish_expected_np,dtype=tf.float32,trainable=False)
    loss = tf.reduce_sum(tf.square(the_prediction[0]-goldfish_expected_tf))
    optimizer = tf.train.AdamOptimizer().minimize(loss)
    sess = tf.InteractiveSession()
    def show_many_images(*images):
        fig = plt.figure()
        for i in range(len(images)):
            subplot_number = 100+10*len(images)+(i+1)
    for i in range(1000):
        _, loss_val =[optimizer,loss])
        if i%100==1:
            print("Loss at iteration %d: %f" % (i,loss_val))
            _, loss_val,adversarial_image,pred,nematode =[optimizer,loss,combined_images,the_prediction,to_add_image])
            res = np.squeeze(pred)
            average = np.mean(res, 0)
            res = res / np.sum(average)
            print([(res[idx], net['labels'][idx]) for idx in res.argsort()[-5:][::-1]])
            plt.imsave('adversarial_goldfish.pdf',adversarial_image[0],format='pdf') # save for printing

    Let me know if this helps you!