Follow strictly the naming and structure of folders below:
Wang_CVPR2020/
├── testing/
│ ├── crn/
│ │ ├── 1_fake/
│ │ └── 0_real/
│ └── ...
├── training/
│ ├── car/
│ │ ├── 1_fake/
│ │ └── 0_real/
│ └── ...
└── validation/
├── car/
│ ├── 1_fake/
│ └── 0_real/
└── ...
- download: link
Ojha_CVPR2023/
├── guided/
│ ├── 1_fake/
│ └── 0_real/
├── ldm_200_cfg/
│ ├── 1_fake/
│ └── 0_real/
├── ldm_100/
│ ├── 1_fake/
│ └── 0_real/
└── ...
Paths to datasets are currently hardcoded in train.py and test.py under /mnt/SCRATCH/chadolor/Datasets/.... Please update these paths to match your environment before running.
You can download sample weights here.
By default, both training and testing use an absolute checkpoints directory under:
/mnt/SCRATCH/chadolor/Datasets/Projects/FakeImageDetector/checkpoints/
Within that directory, checkpoints are grouped by mask ratio (percent):
/mnt/SCRATCH/chadolor/Datasets/Projects/FakeImageDetector/checkpoints/
├── mask_0/
│ ├── rn50ft.pth
│ ├── rn50_modft.pth
│ ├── clipft.pth
│ └── ...
├── mask_15/
│ ├── rn50ft_lowfouriermask.pth
│ ├── rn50ft_midfouriermask.pth
│ ├── rn50ft_highfouriermask.pth
│ ├── rn50ft_pixelmask.pth
│ ├── rn50ft_patchmask.pth
│ ├── rn50ft_fouriermask.pth
│ └── ...
└── ...
Notes:
- Filename pattern encodes model, band, mask_type, optional
combine_aug(e.g.,_rotate,_translate,_rotate_translate), and optional channel suffix (e.g.,_chr). - If you prefer to store checkpoints under the repo (e.g.,
FakeImageDetection/checkpoints/), adjust the save/load paths intrain.pyandtest.pyaccordingly.
test.py evaluates trained models on the Wang_CVPR2020 and Ojha_CVPR2023 datasets. Metrics reported per sub-dataset are Average Precision (AP), Accuracy (Acc), and AUC (all as percentages). When --data_type both is used, results are saved into a single file under results/both, and an Overall Averages row is appended at the end.
python -m torch.distributed.launch --nproc_per_node=GPU_NUM test.py -- [options]
Key options:
--model_name : RN50, RN50_mod, RN50_npr, CLIP_vitl14, MNv2, SWIN_t, VGG11
--mask_type : fourier, cosine, wavelet, pixel, patch, translate, rotate, rotate_translate, nomask
--band : all, low, mid, high, low+mid, low+high, mid+high (optionally append +prog for progressive masking)
--mask_channel : all, r, g, b, 0, 1, 2 (applies to frequency masks only)
--combine_aug : none, rotate, translate, rotate_translate (applies in addition to frequency masks)
--ratio : integer percent (e.g., 15)
--batch_size : default 64
--data_type : Wang_CVPR20, Ojha_CVPR23, or both (default both)
Example testing launcher test.sh:
#!/bin/bash
# Define the arguments for your test script
GPUs="$1"
NUM_GPU=$(echo $GPUs | awk -F, '{print NF}')
DATA_TYPE="both" # both, or Wang_CVPR20, or Ojha_CVPR23
MODEL_NAME="RN50" # RN50, RN50_mod, RN50_npr, CLIP_vitl14, MNv2, SWIN_t, VGG11
MASK_TYPE="fourier" # nomask, fourier, pixel, patch, cosine, wavelet, translate, rotate, rotate_translate
BAND="all" # all, low, mid, high, low+mid, low+high, mid+high
RATIO=15 # automatically becomes RATIO=0 if MASK_TYPE="nomask"
BATCH_SIZE=64
MASK_CHANNEL="all" # all, r, g, b, 0, 1, 2 (applies to fourier/cosine/wavelet)
COMBINE_AUG="rotate" # none, rotate, translate, rotate_translate (combine with frequency masking)
# Set the CUDA_VISIBLE_DEVICES environment variable to use GPUs
export CUDA_VISIBLE_DEVICES=$GPUs
echo "Using $NUM_GPU GPUs with IDs: $GPUs"
# Randomize master port between 29000 and 29999 to avoid clashes
MASTER_PORT=$((29000 + RANDOM % 1000))
echo "Using master port: $MASTER_PORT"
# Run the test command
python -m torch.distributed.launch --nproc_per_node=$NUM_GPU --master_port=$MASTER_PORT test.py \
-- \
--data_type $DATA_TYPE \
--pretrained \
--model_name $MODEL_NAME \
--mask_type $MASK_TYPE \
--band $BAND \
--ratio $RATIO \
--mask_channel $MASK_CHANNEL \
--combine_aug $COMBINE_AUG \
--batch_size $BATCH_SIZE \
Now, use this to run testing:
bash test.sh "0" # gpu id/s to use
Results are saved under:
results/
└── both/
└── rn50ft_fouriermask.txt # example filename
- Each file includes the header, the per-dataset rows formatted as
Dataset, Avg.Prec.(%), Acc.(%), AUC(%), and anAVERAGEline summarizing all evaluated datasets in the run.
This script (train.py) is designed for distributed training and evaluation of models. The script is highly configurable through command-line arguments and provides advanced features such as WandB integration, early stopping, and various masking options for data augmentation.
To run the script in a distributed environment:
python -m torch.distributed.launch --nproc_per_node=GPU_NUM train.py -- [options]Command-Line Options
--local_rank : Local rank for distributed training.
--num_epochs : Number of epochs for training.
--model_name : Type of the model. Choices include various ResNet and ViT variants.
--wandb_online : Run WandB in online mode. Default is offline.
--project_name : Name of the WandB project.
--wandb_run_id : WandB run ID.
--resume_train : Resume training from last or best epoch.
--pretrained : Use pretrained model.
--early_stop : Enable early stopping.
--mask_type : Type of mask generator for data augmentation. Choices include 'patch', 'spectral', etc.
--batch_size : Batch size for training. Default is 64.
--ratio : Masking ratio for data augmentation.
--band : Frequency band to randomly mask.Edit training bash script train.sh:
#!/bin/bash
# Get the current date
current_date=$(date)
echo "The current date is: $current_date"
# Define the arguments for your training script
GPUs="$1"
NUM_GPU=$(echo $GPUs | awk -F, '{print NF}')
NUM_EPOCHS=10000
PROJECT_NAME="Frequency-Masking"
MODEL_NAME="RN50" # RN50, RN50_mod, RN50_npr, CLIP_vitl14, MNv2, SWIN_t, VGG11
MASK_TYPE="rotate_translate" # nomask, fourier, pixel, patch, cosine, wavelet, translate, rotate, rotate_translate
BAND="all" # all, low, mid, high, low+mid, low+high, mid+high ##### add +prog if using progressive masking
RATIO=15
BATCH_SIZE=128
MASK_CHANNEL="all" # all, r, g, b, 0, 1, 2 (applies to fourier/cosine/wavelet)
COMBINE_AUG="none" # none, rotate, translate, rotate_translate (combine with frequency masking)
WANDB_ID="2w0btkas"
RESUME="from_last" # from_last or from_best
learning_rate=0.0001 # 0.0001 * NUM_GPU
# Set the CUDA_VISIBLE_DEVICES environment variable to use GPUs
export CUDA_VISIBLE_DEVICES=$GPUs
echo "Using $NUM_GPU GPUs with IDs: $GPUs"
# Randomize master port between 29000 and 29999 to avoid clashes
MASTER_PORT=$((29000 + RANDOM % 1000))
echo "Using master port: $MASTER_PORT"
# Run the distributed training command
python -m torch.distributed.launch --nproc_per_node=$NUM_GPU --master_port=$MASTER_PORT train.py \
-- \
--num_epochs $NUM_EPOCHS \
--project_name $PROJECT_NAME \
--model_name $MODEL_NAME \
--mask_type $MASK_TYPE \
--band $BAND \
--ratio $RATIO \
--mask_channel $MASK_CHANNEL \
--combine_aug $COMBINE_AUG \
--lr ${learning_rate} \
--batch_size $BATCH_SIZE \
--early_stop \
--pretrained \
# --resume_train $RESUME \
# --clip_grad \
# --debug \Now, use this to run training:
bash train.sh "0,1,2,4" # gpu ids to useImportant:
- When starting the training (from 1st epoch), you can comment out
--resume_train $RESUMEand--clip_grad/--debug. If you don't want Weights & Biases logging, keep--wandb_onlinecommented out and avoid setting--wandb_run_id. - If training stalls mid-epoch, interrupt with Ctrl+C and resume from the last or best epoch using
--resume_train.
- Paths for datasets and checkpoints are hardcoded under
/mnt/SCRATCH/chadolor/...in multiple scripts (train.py,test.py). Please update for your environment. - CLIP freezing bug in
networks/clip_models.py:- In
CLIPModel.__init__(),param.requires_grad = freezeshould use theunfreezeargument (e.g.,param.requires_grad = unfreeze).
- In
- CLIP checkpoint saving/loading consistency:
earlystop.pysaves onlyfcwhenmodel_name == 'clip', while other parts use names likeCLIP_vitl14. Align naming or condition (e.g., check'CLIP' in model_name).
prune.pyis provided as a pruning utility and may require adjustments (paths, arguments) to match your environment.- Distributed launch: scripts currently use
torch.distributed.launch. Considertorchrunfor newer PyTorch versions.
This project is licensed under the Apache License.
If you use this code in your research, please consider citing it. Below is the BibTeX entry for citation:
@article{Doloriel2024FrequencyMF,
title={Frequency Masking for Universal Deepfake Detection},
author={Chandler Timm C. Doloriel and Ngai-Man Cheung},
journal={ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
year={2024},
pages={13466-13470},
}