Split a large XML into pieces using Java(Part-1)

You may sometimes need to split a large xml into small pieces using an xpath. The main idea of splitting an xml into small pieces is that you can make the xml more process-able and easy to use.So today we'll talk about how we can split an xml into pieces according to a given xpath using Java.

Assume we have the following xml string.

String xml="<?xml version="1.0" encoding="UTF-8" standalone="yes"?>         
 <persons>
     <person>
         <id>person0</id>
         <name>name0</name>
         <age>age0</age>
     </person>
     <person>
         <id>person1</id>
         <name>name1</name>
         <age>age1</age>
     </person>
</persons>";

Wneed to split this xml into two parts like follows.

Part-1 

<person>
    <id>person0</id>
    <name>name0</name>
    <age>age0</age>
 </person>

Part-2

<person>
    <id>person1</id>
    <name>name1</name>
    <age>age1</age>
 </person>

So how can we do this using Java.First we need an xml parser to parse this xml. We have many libraries to perform this but if you want to use pure Java we have DocumentBuilderFactory and DocumentBuilder xml parsers.We can parse a string or an input stream to the DocumentBuilder parser object.For the sake of neatness we'll pass an inputstream to the parser.We'll first convert our string to an inputstream.

InputStream inputStream = new ByteArrayInputStream(xml.getBytes(StandardCharsets.UTF_8));

Now we have the inputStream so we'll start xml parsing.Mmmmmmmm....we have a problem...!!!.What if the system is already occupying a parser..??A problem right.So we'll solve the issue first.

Properties systemProperties = System.getProperties();
systemProperties.remove("javax.xml.parsers.DocumentBuilderFactory");
System.setProperties(systemProperties);

Wow...It's done.We just removed the existing parsers. Let's start a new one.

DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setIgnoringComments(true);
domFactory.setValidating(false);
domFactory.setNamespaceAware(true);

What are these setIgnoringComments,setValidating and blahblahblah....!!!!

setIgnoringComments()-Indicates whether or not the factory is configured to produce parsers which ignores comments.
setValidating()-Indicates whether or not the factory is configured to produce parsers which validate the XML content during parse.
setNameSpaceAware()-Indicates whether or not the factory is configured to produce parsers which are namespace aware.

Cool.We'll now instantiate a DocumentBuilder object to parse the inputStream.

DocumentBuilder builder = domFactory.newDocumentBuilder();
builder.setErrorHandler(new XmlErrorHandler());
Document doc = builder.parse(inputStream);

Alright now all set up for xml parsing.I know you have a problem now."what's this new XmlErrorHandler()??".We'll talk about in thenext post.

Comments