SecPrep logoSecPrep

What vulnerability does this XML parsing code have and how do you fix it?

Vulnerable code

python
from lxml import etree

def parse_upload(xml_bytes):
    tree = etree.fromstring(xml_bytes)
    return tree
XML External Entity (XXE)

Fixed

python
from lxml import etree

def parse_upload(xml_bytes):
    parser = etree.XMLParser(
        resolve_entities=False,
        no_network=True,
        load_dtd=False,
    )
    tree = etree.fromstring(xml_bytes, parser=parser)
    return tree

XML External Entity (XXE) injection. The lxml parser is called with no security configuration, so it uses its defaults — which include resolving external entities. An attacker uploads XML containing a DOCTYPE declaration that defines an entity pointing to file:///etc/passwd (or an internal service URL). When lxml parses the document, it reads that file and splices its contents into the XML tree, which the application then returns or logs.

The fix disables all three dangerous features: resolve_entities=False (don't substitute entity references), no_network=True (don't fetch remote URLs), and load_dtd=False (don't even parse DOCTYPE declarations). These three options together prevent all known XXE vectors in lxml.

Practice this in the app →