This repository investigates the impact of adversarial attacks on the explainability of Deep Residual Neural Networks (ResNets), specifically focusing on Saliency Map explanations.
The project systematically generates adversarial examples using various methods, calculates the corresponding Saliency Maps for both the original and adversarial inputs, and quantifies the change in these explanations using a set of comparative metrics. The goal is to assess the robustness of Saliency Maps as a trustworthy explainability method in the presence of minor input perturbations.
You need Python 3.13 and a working environment manager (like conda or venv) to run the experiments.
The necessary dependencies are listed in the requirements.txt file.
If you use Conda, you can set up the required environment using the following commands:
-
Create a Conda environment
conda create -n adv_saliency python=3.13
-
Activate the environment:
conda activate adv_saliency
-
Install dependencies using the provided
requirements.txtfile:pip install -r requirements.txt
-
Deactivate the environment when finished:
conda deactivate
The core logic is divided into two Python scripts handling the adversarial generation and metric calculation. The results are stored in CSV files.
The batch_wise_pipeline.py script generates untargeted adversarial examples for various attacks available in the Foolbox library. It computes saliency maps and comparison metrics for each image. The script takes a command-line argument indicating the index of the attack-group to use (all attacks are given in a list of dictionaries, containing the foolbox attack and the epsilon values to use for the adversarial attack).
The targeted_batch_wise_pipeline.py script generates targeted adversarial examples for various attacks available in the Foolbox library. It computes saliency maps and comparison metrics for each image. The script takes a command-line argument indicating the index of the attack-group to use (all attacks are given in a list of dictionaries, containing the foolbox attack and the epsilon values to use for the adversarial attack).
The analysis of the calculated metrics is executed in various jupyter notebooks ranging from EDA to clustering of the methods/attacks.
Different datasets can be used by editing the load_and_transform_images() function in the batch_wise_pipeline.py. The function should return a tensor of preprocessed images and a tensor of labels (class_ids)
Other models can be used by modifying the following code (using a pytorch model)
weights = ResNet50_Weights.DEFAULT
model = resnet50(weights=weights).eval()
model = model.to(device)
preprocess = weights.transforms()
bounds = (-2.4, 2.8)
fmodel = fb.PyTorchModel(model, bounds=bounds, preprocessing=None)