AI systems introduce new security considerations that differ from traditional software. Understanding these threats and how to assess AI system security is essential for protecting your organization, users, and data.
AI security is not just about protecting AI systems—it’s also about protecting against AI-enabled attacks and ensuring AI systems themselves are secure. The threat landscape has expanded with AI, requiring new approaches to security assessment and mitigation.
| Attack Type | Description | Impact |
|---|
| Prompt Injection | Malicious inputs manipulate model behavior | Data exfiltration / unauthorized actions (especially when connected to tools, RAG, or agents) |
| Model Poisoning | Contaminate training data to insert backdoors | Compromised model behavior |
| Adversarial Examples | Small or carefully crafted perturbations (often imperceptible in vision) cause misclassification | System failure, bypassing safeguards |
| Model Extraction | Reverse-engineer proprietary models | IP theft, competitive advantage loss |
| Membership Inference | Determine if data was in training set | Privacy violations |
| Model Inversion | Reconstruct or infer sensitive attributes / memorized snippets from training data | Privacy violations |
| Supply Chain Attacks | Compromise models or datasets | Widespread impact |
| Incident | What Happened |
|---|
| Chatbot data extraction | Leaked sensitive context/system prompts/RAG content |
| Jailbreaking | Safety overrides bypassed through clever prompting |
| Deepfake fraud | Voice cloning used for financial theft |
| Bias exploitation | Adversaries find and exploit unfair behavior |
| Code generation attacks | Generated code contains vulnerabilities |
Understanding what you’re securing is the first step.
| Question | Why It Matters |
|---|
| What type of AI? | GenAI, predictive ML, computer vision? Different risks |
| What data is used? | Training data sensitivity, PII, proprietary information |
| How is it deployed? | API, edge device, cloud? Different exposure |
| Who are the users? | Employees, customers, public? Different threat models |
| What are the inputs/outputs? | Unstructured text, images, code? Different attack surfaces |
Map threats to your specific AI system.
| Threat Category | Questions to Ask |
|---|
| Data attacks | Can training data be inferred or extracted? |
| Model attacks | Can model be stolen, poisoned, or inverted? |
| Input attacks | Can prompts be manipulated? |
| Output attacks | Can outputs mislead users (hallucinations)? |
| System attacks | Can infrastructure be compromised? |
Evaluate your system’s susceptibility to identified threats.
| Vulnerability | Assessment |
|---|
| Unrestricted input | Can users input anything? Sanitization needed? |
| Excessive output | Does model reveal too much information? |
| Weak access control | Who can access the AI system? |
| No monitoring | Can attacks be detected? |
| Unvalidated outputs | Do users trust outputs blindly? |
| Third-party dependencies | Are models, APIs, data sources secure? |
Layer defenses to address vulnerabilities.
| Control Type | Examples |
|---|
| Input filtering | Sanitize prompts, detect malicious patterns. Treat model output as untrusted; use allowlisted tools, structured parameters, and server-side authorization. |
| Output validation | Warn users about potential errors |
| Rate limiting | Prevent automated attacks |
| Access control | Authentication, authorization for AI access |
| Monitoring | Log with redaction/tokenization, least-privilege access, retention limits, and audit trails. |
| Human review | Critical decisions require human oversight |
| Red-teaming | Test for vulnerabilities before deployment |
- LLM01:2025 Prompt Injection
- LLM02:2025 Sensitive Information Disclosure
- LLM03:2025 Supply Chain
- LLM04:2025 Data and Model Poisoning
- LLM05:2025 Improper Output Handling
- LLM06:2025 Excessive Agency
- LLM07:2025 System Prompt Leakage
- LLM08:2025 Vector and Embedding Weaknesses
- LLM09:2025 Misinformation
- LLM10:2025 Unbounded Consumption
| Concern | Mitigation |
|---|
| Training data privacy | Anonymization, differential privacy, federated learning |
| Model IP protection | Watermarking, access controls, monitoring |
| Inference data | Encryption, secure deletion policies |
| Data pipeline | Secure storage, access logging, audit trails |
| Concern | Mitigation |
|---|
| Model theft | API rate limiting, obfuscation, watermarking |
| Model poisoning | Supply chain vetting, data validation |
| Adversarial robustness | Adversarial training, input filtering |
| Model inversion | Differential privacy, output filtering |
| Concern | Mitigation |
|---|
| Infrastructure | Secure deployment, regular updates, monitoring |
| Access control | Authentication, authorization, least privilege |
| Monitoring | Log analysis, anomaly detection, incident response |
| Testing | Red-teaming, penetration testing, adversarial testing |
Use this matrix to assess AI system security:
| Aspect | Low Risk | Medium Risk | High Risk |
|---|
| Data sensitivity | Public data | Internal business data | PII, health data |
| User access | Internal, authenticated | Partners, customers | Public, unauthenticated |
| System autonomy | Human always involved | Human reviews critical decisions | Fully autonomous |
| Weights Exposure | API-only access | Internal weights | Public weights / Open Source |
| Network Exposure | Offline / Air-gapped | Private Network / VPN | Internet-exposed |
| Tool/RAG Exposure | None | Read-only RAG (internal docs) | R/W Tools, Shared RAG, 3rd-party Plugins |
| Use case | Internal productivity | Customer-facing | Safety-critical |
Based on your assessment, prioritize controls for high-risk areas.
| Practice | Description |
|---|
| Secure by design | Build security in from the start |
| Data governance | Know your training data sources |
| Model documentation | Understand model capabilities and limits |
| Testing | Test for adversarial inputs and edge cases |
| Practice | Description |
|---|
| Defense in depth | Multiple layers of security controls |
| Least privilege | Minimum necessary access to AI systems |
| Monitoring | Continuous security monitoring and logging |
| Incident response | Plan for how to respond to AI security incidents |
| Practice | Description |
|---|
| Regular updates | Keep models, dependencies updated |
| Security reviews | Periodic assessments of security posture |
| Training | Educate teams on AI-specific threats |
| Transparency | Document capabilities and limitations |
| Pitfall | Why It’s Dangerous | Prevention |
|---|
| ”AI is secure by default” | AI introduces new attack surfaces | Assume AI systems are vulnerable |
| Ignoring training data | Training data may contain secrets | Scrub data, understand sources |
| Trusting outputs blindly | Hallucinations can mislead | Validate important information |
| No monitoring | Can’t detect attacks | Comprehensive logging and analysis |
| Overlooking supply chain | Dependencies may be compromised | Vet models, APIs, tools |
- AI has unique threats: Prompt injection, model poisoning, extraction attacks, adversarial examples
- Assessment framework: Scope → Identify Threats → Assess Vulnerabilities → Implement Controls
- Use established standards: OWASP Top 10 for LLM Applications provides a comprehensive checklist
- Layer your defenses: Input filtering, output validation, access control, monitoring, human oversight
- Know your data: Understand what data your AI system uses and exposes
- Plan for incidents: Have response plans for security breaches
- Security is ongoing: Continuous monitoring, testing, and updating
AI security is an emerging field—stay informed about new threats and mitigation strategies.