Why XmlInputFormat is not provided by hadoop?

I am working with hadoop map-reduce. I have to process the data from .xml file, parse it and store the output into the database.

While working on this when I need to pass my xml to mapper, I found that the XmlInputFormat.class is not provided by hadoop by default and we have to use mahout's XmlInputFormat for it.

I wonder when Xml is being use vastly, why hadoop haven't provided the XmlInputFormat for this rather than explicitly creating custom XmlInputFormat bye extending TextInputFormat for it?

Solution

Well even though xml is vastly used, providing framework with special features towards a technology, might not be a good idea. It may be like an endorsement. At high level, Mapreduce is designed to accept different formats. Infact these days json is being used vastly due to its size features compared to xml. Even I had the similar issue.

But its up to the user to decide the input of map reduce and can use, different parsers(Jackson or gson for json and JAXB for xml) if they are in a single line or like above using RecordReader implementation