Skip to content

AI Security Scoping Matrix

AI systems introduce new security considerations that differ from traditional software. Understanding these threats and how to assess AI system security is essential for protecting your organization, users, and data.

AI security is not just about protecting AI systems—it’s also about protecting against AI-enabled attacks and ensuring AI systems themselves are secure. The threat landscape has expanded with AI, requiring new approaches to security assessment and mitigation.


Attack TypeDescriptionImpact
Prompt InjectionMalicious inputs manipulate model behaviorData exfiltration / unauthorized actions (especially when connected to tools, RAG, or agents)
Model PoisoningContaminate training data to insert backdoorsCompromised model behavior
Adversarial ExamplesSmall or carefully crafted perturbations (often imperceptible in vision) cause misclassificationSystem failure, bypassing safeguards
Model ExtractionReverse-engineer proprietary modelsIP theft, competitive advantage loss
Membership InferenceDetermine if data was in training setPrivacy violations
Model InversionReconstruct or infer sensitive attributes / memorized snippets from training dataPrivacy violations
Supply Chain AttacksCompromise models or datasetsWidespread impact
IncidentWhat Happened
Chatbot data extractionLeaked sensitive context/system prompts/RAG content
JailbreakingSafety overrides bypassed through clever prompting
Deepfake fraudVoice cloning used for financial theft
Bias exploitationAdversaries find and exploit unfair behavior
Code generation attacksGenerated code contains vulnerabilities

Understanding what you’re securing is the first step.

QuestionWhy It Matters
What type of AI?GenAI, predictive ML, computer vision? Different risks
What data is used?Training data sensitivity, PII, proprietary information
How is it deployed?API, edge device, cloud? Different exposure
Who are the users?Employees, customers, public? Different threat models
What are the inputs/outputs?Unstructured text, images, code? Different attack surfaces

Map threats to your specific AI system.

Threat CategoryQuestions to Ask
Data attacksCan training data be inferred or extracted?
Model attacksCan model be stolen, poisoned, or inverted?
Input attacksCan prompts be manipulated?
Output attacksCan outputs mislead users (hallucinations)?
System attacksCan infrastructure be compromised?

Evaluate your system’s susceptibility to identified threats.

VulnerabilityAssessment
Unrestricted inputCan users input anything? Sanitization needed?
Excessive outputDoes model reveal too much information?
Weak access controlWho can access the AI system?
No monitoringCan attacks be detected?
Unvalidated outputsDo users trust outputs blindly?
Third-party dependenciesAre models, APIs, data sources secure?

Layer defenses to address vulnerabilities.

Control TypeExamples
Input filteringSanitize prompts, detect malicious patterns. Treat model output as untrusted; use allowlisted tools, structured parameters, and server-side authorization.
Output validationWarn users about potential errors
Rate limitingPrevent automated attacks
Access controlAuthentication, authorization for AI access
MonitoringLog with redaction/tokenization, least-privilege access, retention limits, and audit trails.
Human reviewCritical decisions require human oversight
Red-teamingTest for vulnerabilities before deployment

  1. LLM01:2025 Prompt Injection
  2. LLM02:2025 Sensitive Information Disclosure
  3. LLM03:2025 Supply Chain
  4. LLM04:2025 Data and Model Poisoning
  5. LLM05:2025 Improper Output Handling
  6. LLM06:2025 Excessive Agency
  7. LLM07:2025 System Prompt Leakage
  8. LLM08:2025 Vector and Embedding Weaknesses
  9. LLM09:2025 Misinformation
  10. LLM10:2025 Unbounded Consumption

ConcernMitigation
Training data privacyAnonymization, differential privacy, federated learning
Model IP protectionWatermarking, access controls, monitoring
Inference dataEncryption, secure deletion policies
Data pipelineSecure storage, access logging, audit trails
ConcernMitigation
Model theftAPI rate limiting, obfuscation, watermarking
Model poisoningSupply chain vetting, data validation
Adversarial robustnessAdversarial training, input filtering
Model inversionDifferential privacy, output filtering
ConcernMitigation
InfrastructureSecure deployment, regular updates, monitoring
Access controlAuthentication, authorization, least privilege
MonitoringLog analysis, anomaly detection, incident response
TestingRed-teaming, penetration testing, adversarial testing

Use this matrix to assess AI system security:

AspectLow RiskMedium RiskHigh Risk
Data sensitivityPublic dataInternal business dataPII, health data
User accessInternal, authenticatedPartners, customersPublic, unauthenticated
System autonomyHuman always involvedHuman reviews critical decisionsFully autonomous
Weights ExposureAPI-only accessInternal weightsPublic weights / Open Source
Network ExposureOffline / Air-gappedPrivate Network / VPNInternet-exposed
Tool/RAG ExposureNoneRead-only RAG (internal docs)R/W Tools, Shared RAG, 3rd-party Plugins
Use caseInternal productivityCustomer-facingSafety-critical

Based on your assessment, prioritize controls for high-risk areas.


PracticeDescription
Secure by designBuild security in from the start
Data governanceKnow your training data sources
Model documentationUnderstand model capabilities and limits
TestingTest for adversarial inputs and edge cases
PracticeDescription
Defense in depthMultiple layers of security controls
Least privilegeMinimum necessary access to AI systems
MonitoringContinuous security monitoring and logging
Incident responsePlan for how to respond to AI security incidents
PracticeDescription
Regular updatesKeep models, dependencies updated
Security reviewsPeriodic assessments of security posture
TrainingEducate teams on AI-specific threats
TransparencyDocument capabilities and limitations

PitfallWhy It’s DangerousPrevention
”AI is secure by default”AI introduces new attack surfacesAssume AI systems are vulnerable
Ignoring training dataTraining data may contain secretsScrub data, understand sources
Trusting outputs blindlyHallucinations can misleadValidate important information
No monitoringCan’t detect attacksComprehensive logging and analysis
Overlooking supply chainDependencies may be compromisedVet models, APIs, tools

  • AI has unique threats: Prompt injection, model poisoning, extraction attacks, adversarial examples
  • Assessment framework: Scope → Identify Threats → Assess Vulnerabilities → Implement Controls
  • Use established standards: OWASP Top 10 for LLM Applications provides a comprehensive checklist
  • Layer your defenses: Input filtering, output validation, access control, monitoring, human oversight
  • Know your data: Understand what data your AI system uses and exposes
  • Plan for incidents: Have response plans for security breaches
  • Security is ongoing: Continuous monitoring, testing, and updating

AI security is an emerging field—stay informed about new threats and mitigation strategies.