Search code examples
pythontensorflowmachine-learningdeep-learningcleverhans

Generating adversarial data from cleverhans attack models


I want a code example to how to generate train data from clever hans' adversarial attacks.

adv_x = fgsm.generate_np(X_test, **fgsm_params)

This generates adversarial x data but how can I get y?

adv_pred = model.predict_classes(adv_x)

And this will give the "fooled" results right?

What I want is to correctly show generated x, y, fooled y (by which I mean results of models predictions that may be false because of the attack). I'm using Mnist btw, if it helps.


Solution

  • Based on the code snippets you shared, I would make two suggestions:

    • It is generally not a good idea to train the model on test data (if you are going to use that test data to evaluate its performance afterwards) so I would replace X_test by X_train in your first line.

    • To get the label for your adversarial examples, you can use the original labels of the training data or the predictions of the model on the original training data model.predict_classes(X_train) (this assumes that the adversarial example is not perturbed enough to change the label of the input).