$24
1. Introduction
XML stands for eXtensible Markup Language, a derivative of SGML (upon which HTML is also based) used to represent structured data objects as human-readable text. An XML parser extracts the data from complex structured XML files. Unless a program simply copies the whole XML file as a unit, every program must implement or call on an XML parser. There are various types of XML injection attacks that can cause damage. Though many computer languages and libraries have improved safety configurations and security features, there are still vulnerabilities in XML specifications and XML parsers to be exploited. See the chapter on XML Injection Attacks for more details on XML bomb attacks and XML external entity attacks.
1.1 Exercise Description
In this exercise, we provide a Java XML parser using standard libraries and two exercises that apply two different types of XML attacks. Java provides libraries to parse, modify, and inquire XML documents.
There are two exercises:
Coersive parsing: parsing a large junk data file to take up memory, slow down speed, and hence affect the entire system.
Information disclosure: exploiting the external entity to force an information disclosure.
1.2 Vulnerability Mitigation
To mitigate the XML attacks we will follow two approaches. We will modify the parser settings, and we will modify the parser to include time-outs and support white lists. The chapter on XML Injection Attacks covers more of these mitigation practices.
2. Exercise Instructions
These two exercises will be done using the command line terminal (shell) of the virtual machine. To open the terminal, right-click on the "EXERCISES" directory and select "Open in Terminal". Enter the following command to change into the exercise directory:
$ cd 3.8.4_XML_Injections
You will need two shell windows for this exercise, one for executing the command line and one for inspecting the XML file or parser.
2.1 Exercise 1
A coersive attack in XML involves parsing deeply nested XML documents that contain tags but not the corresponding closing tags. The idea is to make the victim use up and eventually deplete the machine's resources, causing a denial of service on the target. Removing the closing tags simplifies the attack since it requires only half the size of a well-formed document to accomplish the same results. The number of tags being processed eventually caused an error mesage in this virtual machine. If run on other computer, such as a Linux system in the CS Department, a heap out-of-memory or garbage collector error might occur.
Enter the following command to change into the exercise one directory:
$ cd Exercise_One
Compile the Parser
To compile this parser, enter the following command in one of your shell windows:
$ make
The parser we are using for this exercise is xmlParser.java, making use of the Java interface, XMLReader.
Run the Parser
To Run this parser, enter the following command in one of your shell windows:
$ java xmlParser
The parser will throw an I/O Exception in shell window as following:
java.io.IOException: Need a valid XML file name.
The parser program requires an argument with the name of XML file you want to parse. To parse a valid XML file called books.xml, enter:
$ java xmlParser books.xml
You should see output in your window showing content from books.xml file.
For this exercise, to parse a malformed large XML file, largeFile.xml(900mb), enter
$ java xmlParser largeFile.xml
Note: Be careful! Parsing the malformed largeFile.xml (900mb) may crash your computer. Make sure you are ready before parsing it.
You should see some output in your window, printing out tag names from the largeFile.xml file. Then you should see the speed of printing in the console slowing down, and eventually the virtualBox will detect an error stating:
An error has occured duing virtual machine execution! The error details are shown below. You may try to correct the error and resume the virtual machine execution.
If you parse this file in one of the Linux systems in CS computer lab, an exception will be thrown:
java.lang.OutOfMemoryError: Java heap space or GC overhead limit exceeded
Inspect the File and Parser
Now that you have seen the result of our coersive parsing attack, it is time to inspect largeFile.xml to understand the cause of the problem:
$ more largeFile.xml
This file contains 80,000,000 nested starting tags, which forces the parser to parse starting tags that don't have ending tags. Any attacker could craft a long malformed XML input file.
Now let's look at the implementation of the XML parser to better understand the attack vector. A good place to start is in xmlParser.java. Use your favorite text editor to open this file. For example:
$ vim xmlParser.java
In xmlParser.java, the XMLReaderFactory class creates an XML reader, and the XMLReader interface is used to read and parse an XML file. In this parser, MyContentHandler implements the default content handler to print out content of the file while the file is being parsed.
Mitigate the Vulnerability: Forcing a Time-Out
In this mitigation, you will set a timer to go off if the parsing is taking too long processing a long, malformed XML file.
To mitigate this vulnerability go the the Mitigation folder in the Exercise_one folder:
$ cd Mitigation
In the Mitigation folder, there are three files:
Filename
Purpose
xmlParser.java
This file has a copy of the xmlParser we use to parse the largeFile.xml. Use this copy to edit and write your own code to mitigate this vulnerability.
Makefile
This Makefile is for you to compile your own version of the xmlParser.
testFile.xml
As parsing the largeFile.xml might cause negative effects to your computer, we do not recommend testing your code with this file. This testFile.xml has fewer nested tags for testing purposes.
As largeFile.xml is over 900MB, you probably shouldn't make another copy in this subdirectory. To final test your modified parser with this file:
$ java xmlParser ../largeFile.xml
For this mitigation, first start by familiarizing yourself with the XMLReader
class. The goal is for your code to create a new thread to do the XML parsing. The main thread will then wait for this thread to complete, with a time-out if the parsing thread takes too long.
2.2 Exercise 2
In this exercise, you will get familiar with how an XML parser might get access to a local confidential file (passwd.xml), and learn two ways to prevent it.
Change to the Exercise 2 directory:
$ cd /home/user/Desktop/EXERCISES/3.8.4_XML_Injections/Exercise_Two
Compile the Parser
To compile this parser, enter the following command:
$ make
Run the Parser
To Run this parser, enter the following command in one of your shell windows:
$ java xmlParser passwd.xml
You should see the contents of the /etc/passwd file as output.
Inspect the File and Parser
Now that you observed an information disclosure using an external entity, it is time to look at the passwd.xml to understand how this disclosure happened:
$ vim passwd.xml
This file contains an external entity referencing a local file, /etc/passwd. While processing this file, the parser replaces XXE with the contents of /etc/passwd.
Now that you understand the basic format of an external entity, it's time to look at the implementation of the parser to find the attack vector. Again, we will start in xmlParser.java.
$ vim xmlParser.java
Mitigate the Vulnerability
For this mitigation, we will follow two approaches: (1) disabling external entity expansions by changing the configuration settings and (2) comparing the external entity that is referenced with the contents of a white list.
To start these mitigations, go into the Mitigation folder:
$ cd Mitigation
Approach (1): Disable the External Entity
The first approach consists of turning off the option that allows the use of external entities. This option is turned on by default.
Go into the ApproachOne folder:
$ cd ApproachOne
In this folder, we have three files:
Filename
Purpose
xmlParser.java
This file is a copy of the xmlParser that we use to parse the largeFile.xml. Use this copy to edit and write your own code to mitigate this vulnerability.
Makefile
This Makefile is for you to compile your own version of the xmlParser.
passwd.xml
This file is used to test your mitigation.
The class xmlReader has a method, setFeature, for changing the settings of the parser configuration. To disable external entity process, you will need to insert a couple of calls to this method:
xmlReader.setFeature(featureName: String, Flag: boolean);
featureName is a URL referencing a specific feature for the parser. To figure out which feature name and flag to use for disabling the external entity, you can reference the article from apache.org on Setting Features for detailed information.
Approach (2): Permit access only if reference is on a white list
Turning off the external entity processing is a safe solution to our problem but lacks flexibility. There are cases where it might be appropriate to use an external entity, such as wanting to include the output of a program (specified in a URL) in your XML file.
As a result, we will use a white list to keep track of which files or URLs are OK to parse.
To try this second approach, go into the ApproachTwo folder:
$ cd /home/user/Desktop/EXERCISES/3.8.4_XML_Injections/Exercise_Two/Mitigation/ApproachTwo
You will modify the XML parser to compare the external entity reference with the with strings on your white list. If the external entity is on the white list, this entity will be parsed as normal. If it is not on the white list, then the parser will ignore that external entity.
There are seven files in the directory for this exercise:
Filename
Purpose
xmlParser.java
Contains is the same xmlParser that we used previously. Use this copy to edit and write your own code to mitigate the vulnerability.
Makefile
This Makefile is for you to compile your own version of the xmlParser.
passwd.xml
Used to test your mitigation. If it works, the confidential file /etc/passwd will not be parsed.
whiteListForXMLXXEAccess.txt
This file contains the list of strings that the xmlParser will allow to be used as external entities.
readable.xml
This file is used to test your mitigation. If it works, the local file readableFile.txt will be parsed.
normalHtml.xml
This file is used to test your mitigation. If it works, the remote file normalHtml.xml will be parsed.
readableFile.txt
This file is on the white list. If your mitigation works, this file will be parsed.
To implement the mitigation add the line below in the main of xmlParser.java. That will allow your replacement resolveEntity method be called by the parser. That resolveEntity method is called every time that your parser finds an external entity.
xmlReader.setEntityResolver(new MyResolver());
Now you are ready to write our own resolver. Add in xmlParser.java:
class MyResolver implements EntityResolver{ public InputSource resolveEntity(String publicId, String systemId){ // // Your white list checking code goes here // } }
The parameter systemId is the external entity found by the XML parser. This is the string that you need to check.
Your resolveEntity method must return null if the external entity was found on the white list. That means that the parser will expand it.
Your resolveEntity method must return something different than null (for example the empty string, "") if the entity was not found on the white list. That means that the parser will ignore that entity and continue with the parsing process.
Below is an example of returning an empty InputSource for the parser not to expand the external entity.
return new InputSource(new StringReader(""));
The white lists you will be using for this exercise will contain only absolute paths after the protocol header.
3. Delivery Instructions
You'll need to deliver a short report containing:
The mitigation code (well commented) for mitigating the first XML attack (long malformed XML input file).
For the above mitigation, the output produced by the system when trying to run the attack.
The code for the first mitigation for the XXE attack.
For the above mitigation, the output produced by the system when trying to run the attack.
The code for the second mitigation (white list) for the XXE attack.
For the above mitigation, the output produced by the system when trying to run the attack.
Explanations on the attacks and mitigations, and your conclusions.