This repository contains code for the paper: NetDPSyn: Synthesizing Network Traces under Differential Privacy. NetDPSyn is the first system to synthesize high-fidelity network traces under privacy guarantees.
-
Download the raw datasets from here. And save them in
./temp_data/raw_data/folder. -
Install all dependencies listed in requirements.txt:
pip install -r requirements.txt -
Your directory structure should look like this:
NetDPSyn └── temp_data └── raw_data └── caida.csv └── cidds.csv └── dc.csv └── ton.csv └── ugr16.csv └── exp └── ... -
Note: Please ensure all paths in
config_dpsyn.pyare updated to reflect your local directory structure. Additionally, you can adjust parameters inparameter_parser.pyas needed.
-
Preprocess Data. Run
lib_preprocess/preprocess_network.py. This will generate a preprocessed pickle file in thetemp_data/processed_datafolder, along with a mapping for binning. Additionally, a trivially decoded CSV file (binning and unbinning) will be created in thetemp_data/synthesized_recordsfolder.python preprocess_network.py -
Synthesize Data. Next, run
main.pyto generate the synthesized data. The synthesized data will be saved in thetemp_data/synthesized_recordsfloder.python main.py -
Downstream Tasks. You can run code from
lib_downstream(e.g.,lib_downstream/ml_tasks.py). This will print out the evaluation results for both the raw dataset and the synthesized dataset.python ml_tasks.py
If you find our work useful for your research, please consider citing the paper:
@inproceedings{10.1145/3646547.3689011,
author = {Sun, Danyu and Chen, Joann Qiongna and Gong, Chen and Wang, Tianhao and Li, Zhou},
title = {NetDPSyn: Synthesizing Network Traces under Differential Privacy},
year = {2024},
isbn = {9798400705922},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3646547.3689011},
doi = {10.1145/3646547.3689011},
abstract = {As the utilization of network traces for the network measurement research becomes increasingly prevalent, concerns regarding privacy leakage from network traces have garnered the public's attention. To safeguard network traces, researchers have proposed the trace synthesis that retains the essential properties of the raw data. However, previous works also show that synthesis traces with generative models are vulnerable under linkage attacks.This paper introduces NetDPSyn, the first system to synthesize high-fidelity network traces under privacy guarantees. NetDPSyn is built with the Differential Privacy (DP) framework as its core, which is significantly different from prior works that apply DP when training the generative model. The experiments conducted on three flow and two packet datasets indicate that NetDPSyn achieves much better data utility in downstream tasks like anomaly detection. NetDPSyn is also 2.5 times faster than the other methods on average in data synthesis.},
booktitle = {Proceedings of the 2024 ACM on Internet Measurement Conference},
pages = {545–554},
numpages = {10},
keywords = {differential privacy, network flows, network packets, synthetic data generation},
location = {Madrid, Spain},
series = {IMC '24}
}
