Be Your Own Neighborhood:Detecting Adversarial Example by the Neighborhood Relations Built on Self-Supervised Learning


Deep Neural Networks (DNNs) have achieved excellent performance in various fields. However, DNNs’ vulnerability to Adversarial Examples (AE) hinders their deployments to safety-critical applications, e.g., autonomous driving. This paper presents a novel AE detection framework, named BEYOND, for trustworthy predictions. BEYOND performs the detection by distinguishing AE’s abnormal relation with its augmented versions, i.e. neighbors, from two prospects: representation similarity and label consistency. An off-the-shelf Self-Supervised Learning (SSL) model is used to extract the representation and predict the label for its highly informative representation capacity compared to supervised learning models. For clean samples, their representations and predictions are closely consistent with their neighbors, whereas those of AEs differ greatly. Moreover, we explain this observation and show that by leveraging this discrepancy BEYOND can effectively detect AEs. Experiments show that BEYOND outperforms baselines by a large margin, especially under adaptive attacks.


1. Pipeline of the proposed BEYOND framework.


2. The AUC of Different Adversarial Detection Approaches on CIFAR-10.


3. TPR@FPR 5% of BEYOND against Gray-box Attack.


4. ROC Curve of Perturbation Budgets.


@article{he2022your,
  title={Be Your Own Neighborhood: Detecting Adversarial Example by the Neighborhood Relations Built on Self-Supervised Learning},
  author={He, Zhiyuan and Yang, Yijun and Chen, Pin-Yu and Xu, Qiang and Ho, Tsung-Yi},
  journal={arXiv preprint arXiv:2209.00005},
  year={2022}
}