Navigating the Realm of Adversarial Machine Learning
Hi there!
This year, I took part in my first Capture The Flag (CTF) competition by AI Village @ DEFCON 31, and the experience was intriguing, to say the least. The challenges, particularly those involving pixel attacks, caught my attention and are the main focus of this post. While I initially intended to share a simple version of a pixel attack I performed during the competition, the goal of this post is to also delve into strategies for strengthening ML models to better withstand pixel attacks like the ones encountered in the competition.
Before we dive into the theory, let’s set the scene with a scenario that’ll grab your attention.
Picture this: our company, MM Vigilant, is on a mission to develop a cutting-edge object detection product. The concept is simple yet revolutionary — customers snap a picture of the desired item, and it is delivered at their doorstep a few days later. As the brilliant data scientist behind the scenes, you’ve crafted the ultimate image-based object classification model. The classification results are impeccable, the model evaluation metrics are top-notch, and stakeholders couldn’t be happier. The model hits production, and customers are delighted — until a wave of complaints rolls in.
Upon investigation, it turns out someone is meddling with the images before they reach the classifier. Specifically, every image of a clock is being mischievously classified as a mirror. The consequence? Anyone hoping for a clock is receiving an unexpected mirror at their door. Quite the unexpected twist, isn’t it?
Our stakeholders at MM Vigilant are both concerned and intrigued by how this mishap occurred and, more importantly, what measures can be taken to prevent it.
The scenario we just explored is a hypothetical situation —though image tempering is a very likely scenario, especially if there are vulnerabilities in the model.
So let’s take a closer look on one such manipulation of images…