Trust Not What You Cannot Validate

Posted by

Microsoft recently released another article in the Artificial Intelligence and Machine Learning Security section. I believe it should be required reading. In general, it talks about the unique security challenge faced by AI based systems which will require some unique or fine-tuned approaches to system security. One of the aspects the article discusses is the idea of being able to trust inputs to AI based systems. NIST has a document on how these risks differ. 

The issue

AI Systems are by their nature software systems. Foundationally, developing an AI based system is very similar to building any other software based system. However, there are some areas that need to be handled differently than standard software projects. For example, input validation. 

Input validation is a core defensive technique in any software system. You validate your inputs to the system regardless of where they are coming from, users, databases, or external systems. This is harder to do in a GPT style system because the inputs will be anything a user wants to ask about. This has broken the clean separation between data and functionality in a software system. Now the data is directly integrated with the functionality. 

What about poisoning the training data? It is possible that open data sets that are used to train the AI models will come from open source projects, or publicly released data. These are data sets that attackers can potentially add data to. What you can end up with is a self-perpetuating confidentiality ramp where untrusted data, with enough positive reinforcement becomes trusted data. Just like social media and search engines, when users continue to select the same thing, the system will present that thing to other users as the thing the system assumes they are looking for. 

This is how misinformation campaigns work. A few untrusted sources plant tweets (or X’es whatever they are now) videos and Tik Tocks, which their bot accounts then share and reshare, and soon interested humans begin sharing them and resharing them. This results in everyone seeing this fake message come across their social media and assuming it’s true because everyone knows about it. 

Differentiating between malicious and benign inputs will be a challenge. The AI model itself can’t tell the difference. So, the natural reaction would be to validate the input when the prompt is constructed. But can you? It comes down to being able to discern intent. How can you create an input validation system that can determine the intent of the prompt? How can you tell the difference between a post-grad research student asking about historical atrocities for humanitarian research, and a new age extremist researching them as a playbook? 

What do we do?

So how do we approach that? Do we start implementing an Admiralty System for public data sets complete with hash validation? An approach to solving this is to validate that the original contributors to the data sets were known trustworthy sources. Internally generated data sets won’t require as much scrutiny because your employees have already been vetted and are acting in the best interest of the company. If they aren’t, you have an entirely different problem on your hands. 

Perhaps we need a peer review system for data sets much like the academic peer review system for research papers and experiment validation. This has the benefit of actively validating the data, but when you consider the size of most training data sets, it represents a massive undertaking by anyone taking on the job of reviewing it to prove its accuracy. Open source data sets are not guaranteed to be peer reviewed. In many cases, individuals who download open-source source code or data sets are not trained to know what to look for or how to properly, thoroughly validate a data set. This would have to be a more formalized process by validated experts in their fields. 

Any of the input data sets need to be validated in this manner. Retrieval Augmented Generative (RAG) systems would need to be able to validate the information presented to them that they are going to base their generated output on. This may seem like a non-issue because the only one at risk is the users that presented the information who will be the one presented with the output. However, there is another angle here. 

Presenting a RAG system with particularly crafted data and then crafting specific prompts on that data can expose proprietary information about the model, and even the data it was trained on. In some cases, this could be exposing date or trade secrets to malicious users. 

While input validation in software systems has existed for decades and it is a well understood practice, input validation on AI systems requires specific activities to handle the very contextual nature of AI interfaces. Emanuele et al. have several suggestions in their paper Wild Patterns Reloaded: A Survey of Machine Learning Security against Training Data Poisoning. It is also worth a read. 

Overall, these new attack flavours will mean a creative approach to protecting out AI based systems.

Leave a Reply

Your email address will not be published. Required fields are marked *

Verified by MonsterInsights