LLM — Security Challenges & Guardrails

Amit Singh Rathore
6 min readJan 16, 2024

LLM Best practices for security

Language Model (LM) applications play a crucial role in various fields, from natural language processing to chatbots and virtual assistants. As the adoption of these applications grows, so does the need for robust security measures to protect against potential threats. In this article, we will explore best practices for securing Language Model applications

Prompt Injection

Prompt Injection is a potential vulnerability where attackers manipulate the input prompts to trick the language model into generating unintended or harmful outputs. Attackers can also manipulate LLM’s through crafted inputs, causing it to execute the attacker’s intentions.

Data Poisoning

Data Poisoning involves manipulating the training data to introduce vulnerabilities, biases, and backdoors, or compromise the integrity or the ethical behavior of the language model.

Sensitive Information Disclosure

LLM applications can inadvertently disclose sensitive information, proprietary algorithms, or confidential data, leading to unauthorized access, intellectual property theft, and privacy breaches.

Model Denial of Service

Model Denial of Service occurs when an attacker overwhelms a Large Language Model (LLM) with a high volume of requests. This can result in a decline in the quality of service or model unavailability for other users, as well as potentially incurring high resource costs.

Agent Uprising

LLM agents take actions based on LLM output, attacker might trick LLM to take control of agents and cause disruption in the enterprise systems. Like it can overload mail servers by sending too many emails, or overload JIRA by creating too many issues.

Model Theft

LLMs are the result of extensive research and resources. They are valuable assets to companies. If these models are stolen, it can lead to economic losses, loss of competitive advantage, and potential misuse. Models can be stolen from repositories or they can be replicated by tricks like shadowing.

Shadow Model Creation Through API Queries — An attacker repeatedly queries this API with a wide range of inputs, collecting the outputs. Using this data, they can train a “shadow model” that mimics the proprietary LLM’s behavior.

Scalable Extraction of Training Data From Language Models.

Various Attacks: Image from https://github.com/greshake/llm-security

Guardrails:

Preventing the above-mentioned security risk requires a thorough understanding of the system. We can use the following as Guardrails to avoid these security issues:

  • Use Secure Prompts: Only use prompts created/vetted by your team. Create standard prompt templates.
  • Use Parameterized Queries: Pass parameters to the LLM as part of the prompt, making it harder for attackers to alter the prompt since the LLM only generates output based on these parameters.
  • Use Input Validation: Check user input for harmful content before passing it to the LLM to prevent harmful text from being inserted into the prompt.
  • Contextual Analysis: Incorporate contextual analysis mechanisms to understand the context of the prompts and prevent malicious manipulations.
  • Use a Secret Phrase: A secret phrase in the prompt required for the LLM to generate output can prevent attackers from altering the prompt.
  • Privilege Control: Ensure the LLM only has access to required backend systems and information.
  • Use a Secure LLM Provider: When using third-party LLM choose a provider that has measures against prompt injection, like input filtering, using sandboxes, and monitoring for odd activity.
  • Monitor Your Output: Keep an eye on what the LLM produces for any signs of harmful content, and report anything odd.
  • Human in the Loop: Always have a human check or oversee critical functions of the LLM.
  • Segregate External Content: Keep user prompts and external content (like web pages) separate.
  • Trust Boundaries: Make sure the LLM knows what sources to trust and what not to. For example, it shouldn’t blindly trust all web pages or all user inputs.
  • Verify Data Sources: During training and fine-tuning, it’s crucial to ensure that the data sources used are legitimate and reliable. Implement strict data vetting and data filtering processes for the data that is used in training.
  • Data Integrity Checks: Implement robust data integrity checks during the training phase to detect and mitigate any attempts at poisoning the training data.
  • Different Models for Different Use-Cases: Instead of a one-size-fits-all approach, create specific models trained on separate data for different purposes. This way, even if one set of data is compromised, it doesn’t affect all applications of the LLM.
  • API Rate Limits: Restrict how many requests a user or IP address can make within a specified timeframe. This prevents one user from flooding the system.
  • Limit Agent Actions: If the LLM’s responses trigger other actions in the system, ensure there’s a limit to how many of these can be queued up. This acts as a buffer against potential resource exhaustion.
  • Use Trusted Plugins: Only use plugins or third-party components that have been tested and are known to be safe.
  • Follow MLOps Best Practices: This includes practices like continuous integration and deployment, automated testing, and monitoring, all of which can help catch vulnerabilities early.
  • Monitoring and Patching: Regularly check for vulnerabilities in your system and apply patches or updates as necessary.
  • Review Supplier Security: Periodically review the security practices and access levels of any suppliers or third-party collaborators.
  • Limit External Data Access: If the LLM has the capability to fetch data from external sources, this should be tightly controlled to avoid unintentional data exposure.
  • Rule of Least Privilege: When training models, only provide them with the data they absolutely need. They shouldn’t have access to confidential or proprietary datasets unless necessary.
  • Strict Parameterized Input: Ensure that apps only accept strictly defined and expected inputs. Any unexpected data or format should be rejected. It’s like a bouncer at a club only letting in guests on the list.
  • Inspections and Tests: Regularly inspect and test the app using tools and techniques like SAST (Static Application Security Testing), DAST (Dynamic Application Security Testing), and IAST (Interactive Application Security Testing). These tests will identify potential vulnerabilities or weak points in the app.
  • Authentication and Authorization: Just because someone has a key (authentication) doesn’t mean they should access all rooms (authorization). Ensure that the app verifies not just who is making a request but also if they have the right permissions to do so.
  • Limit Plugin/Tool Access: Ensure that the LLM can only access a limited set of plugins or tools.
  • Granular Functionality: Instead of having open-ended functions, plugins should have very specific, granular functionality. We should be more specific for the functionality e.g. instead of send-email we should say “send an email with pre-defined templates”.
  • Human Approval: Just as you’d want to manually approve any major decision made by a robot butler, LLM actions, especially those with significant consequences, should require manual user confirmation or approval.
  • Logging, Monitoring, and Rate Limiting: Keep a close eye on what the LLM is doing. Log its actions, monitor for unusual behaviors, and implement rate limits. For instance, if the LLM is sending emails, there could be a limit on how many emails it can send in a given timeframe to prevent spamming.
  • User Authorization for Sensitive Actions: For actions that can have significant consequences, and require manual approval or confirmation from the user.
  • Watermarking: Embed unique identifiers or patterns in the LLM’s outputs.
  • Model and Code Signing: Ensure any external models or code you use is signed, verifying its authenticity and integrity.
  • Redaction: Integrate redaction mechanisms to automatically identify and remove sensitive information from model outputs.
  • Data Classification: Classify input and output data based on sensitivity, and apply appropriate security controls to restrict access.
  • Privacy Impact Assessments: Conduct regular privacy impact assessments to identify potential privacy risks and take corrective actions.

Building a secure LLM-based app application requires a comprehensive and multi-faceted approach. By addressing threats such as Prompt Injection, Data Poisoning, Agent Security, Model DoS, Sensitive Information Disclosure, Model Theft, and Shadow Models, we can build resilient systems that protect against potential risks and vulnerabilities. Continuous monitoring & assessment is key to building trustworthy LLM apps.

--

--

Amit Singh Rathore

Staff Data Engineer @ Visa — Writes about Cloud | Big Data | ML