AI Safety and Security

Despite the remarkable advancement of deep learning (DL) techniques in the past decade, certain inherent shortcomings and vulnerabilities persist, undermining the performance of these models in critical situations. First and foremost, DL models are prone to systematic failures when handling subsets of data involving unusual cases, out-of-distribution samples, and complex scenarios. Secondly, they are vulnerable under carefully-crafted malicious inputs, such as adversarial examples, where minuscule perturbations can mislead deep neural networks into making erroneous predictions. These flaws and susceptibilities raise serious concerns regarding the deployment of deep neural networks in domains where safety is paramount, e.g., autonomous driving and medical systems.

We introduce a collection of defense frameworks designed to safeguard deep neural networks from the risks outlined previously, applicable across a variety of tasks. Our strategies fundamentally use the outputs from deep neural networks as feedback, and incorporate the principle of semantic input validation with generative models to detect adversarial examples and out-of-distribution samples. In more detail, we conduct this input validation by supplying both the input and the predictions to a generative model, such as a Generative Adversarial Network or Stable Diffusion. This generates a synthetic output, which we then compare to the original input. For legitimate inputs that are correctly inferred, the synthetic output attempts to reconstruct the input. On the contrary, for AEs or OODs, instead of reconstructing the input, the synthetic output would be created to conform to the wrong predictions whenever possible. Consequently, by validating the distance between the input and the synthetic output, we can distinguish AEs or OODs from legitimate inputs.

We also devote ourselves to enhancing model performance by pioneering new test and debugging methods. To start, we delve into test input prioritization, a technique designed to pinpoint “high-quality” test instances from a large volume of unlabeled data. Through this selection, we can uncover more model failures with less labeling effort, thereby streamlining the process of identifying and addressing the model’s performance shortfalls. Next, we investigate active learning strategies, which involve choosing a set amount of unlabeled data that, following human annotation and model retraining, could most effectively boost the model’s performance. In addition, we engage in failure discovery and reasoning, a process aimed at identifying visual attributes or specific patterns that are comprehensible to humans and that contribute to model failures. Gaining an understanding of these factors is vital for researchers aiming to improve their models.

Out-of-Distribution Detection with Semantic Mismatch under Masking

Deep neural networks (DNNs) are trained under a "close-world" assumption, where all the samples fed to the model are assumed to follow a narrow semantic distribution. However, when deployed in...

Revisiting the Efficacy of Deep Active Learning for Vision Tasks

Deep Active Learning (DAL) has been advocated as a promising method to reduce labeling costs in supervised learning. However, existing evaluations of DAL methods are based on different settings, and their...

What You See in Not What the Network Infers：Detecting Adversarial Examples Based on Semantic Contradiction

Adversarial examples (AEs) pose severe threats to the applications of deep neural networks (DNNs) to safety-critical domains, e.g., autonomous driving. While there has been a vast body of AE defense...

Be Your Own Neighborhood：Detecting Adversarial Example by the Neighborhood Relations Built on Self-Supervised Learning

Deep Neural Networks (DNNs) have achieved excellent performance in various fields. However, DNNs' vulnerability to Adversarial Examples (AE) hinders their deployments to safety-critical applications, e.g., autonomous driving. This paper presents...

HybridRepair：Towards Annotation-Efficient Repair for Deep Learning Models

A well-trained deep learning (DL) model often cannot achieve the expected performance after deployment due to the mismatch between the distributions of the training data and the field data in...

HiBug：On Human-Interpretable Model Debug

Machine learning models can frequently produce systematic errors on critical subsets (or slices) of data that share common attributes. Discovering and explaining such model bugs is crucial for reliable model...

MixDefense：A Defense-in-Depth Framework for Adversarial Example Detection Based on Statistical and Semantic Analysis

Deep learning-based AI systems have dominated several long-standing machine learning tasks. Deep neural networks are increasingly used in high-stakes applications, raising worries about their safety and reliability. Fighting anomalous inputs...