This is a final project for the course CS 282R: “Robust Machine Learning” taught by Prof. Yaron Singer in the spring semester of 2018, done with Sharon Qian.
While machine learning algorithms have shown impressive results, there have been a large number of publications highlighting the lack of robustness in these models. Specifically, various adversarial algorithms have been created to apply small pixel perturbations to images to fool classifiers. This has begun a cycle of new adversarial noise generators and new techniques to protect against adversarial data. Through an experiment with multilayer perceptrons over the MNIST dataset, we show that we can train a classifier to detect adversarially corrupted samples generated by three different algorithms with high accuracy. We also argue that given an adversarial algorithm, we can learn interesting properties of the decision boundaries of the original classifier.
Our paper is available here.