Latest

Recent Posts

Focused guidance on securing modern AI systems, from RAG to tool-using agents.

Prompt Injection in RAG Pipelines Is the New Supply Chain Risk

RAG systems introduce a second supply chain: the document corpus. If the corpus is untrusted or easily poisoned, an attacker can insert instructions that override your system prompt and trigger unsafe actions. This is not a theoretical edge case; we see it in customer deployments across internal wikis, PDFs, and shared knowledge bases.

What changes in RAG? The model now treats retrieved content as authoritative context. Without strict boundaries, it will follow malicious instructions hidden inside trusted documents.

We recommend treating every retrieved document as untrusted input, even if it came from “internal” sources. Attackers only need one weak link — a stale document, an unreviewed upload, or a compromised wiki page — to poison the context.

Defensive controls that matter

  • Assign trust levels to sources and keep low-trust data out of high-risk actions.
  • Run retrieval filtering to remove prompt-like strings and suspicious instruction patterns.
  • Use a separate safety layer that enforces hard rules, independent of model output.
  • Log and review model decisions that reference external documents.

Most teams already do some of this, but the key is consistency. Make guardrails part of the RAG pipeline, not an afterthought in the UI layer.

Agentic Tool Abuse: Hardening MCP and Plugin Workflows

Agentic systems amplify risk because a single prompt can trigger multiple tool calls. When tools are overly permissive, the model can be tricked into taking real-world actions such as reading sensitive files, making unintended API calls, or leaking data through indirect channels.

We’ve seen successful prompt injection attacks where the model was tricked into “helpfully” exporting data, granting extra permissions, or executing actions outside of the user’s intent.

Hardening checklist

  • Apply least-privilege scopes to each tool and require explicit user intent for high-risk actions.
  • Validate tool outputs before the model can act on them.
  • Limit tool chaining depth and enforce rate limits for sensitive operations.
  • Implement policy-based guards that reject ambiguous or risky instructions.

Well-designed guardrails reduce the attack surface without sacrificing the workflow benefits of agentic systems. The goal is not to block agents — it’s to ensure every action is accountable and intentional.

LLM Data Leakage: Logging, Redaction, and Secrets Hygiene

LLM systems often collect telemetry for observability. The problem: prompt and completion logs can accidentally store secrets, credentials, or sensitive business data. Once it hits a log or a trace, it’s very difficult to control downstream exposure.

We’ve seen incidents where secrets were leaked into monitoring dashboards, customer support tools, or long-term storage simply because logs were never scrubbed.

Practical steps

  • Redact secrets at ingestion using deterministic patterns and contextual scanning.
  • Separate security telemetry from developer analytics to reduce exposure.
  • Expire and rotate embedding stores that include user-generated text.
  • Define retention policies for prompts and completions, and enforce them.

Security and observability can coexist, but only with strict data boundaries and enforcement. Treat LLM logs like sensitive production data — because they usually are.

Contact Us