I'm searching for the right tool to transform text files into xml.
The text file looks like this:
2017-01-03-10.11.1201000B H4_01DE33411121...
2017-01-01-09.12.1301000BHAX4_01DE34256137...
2017-01-01-10.12.1301000BMLH4_01DE63789221...
Each line is the content of an entity and I have following information:
Letter 0-18: Attribute1
Letter 19-21: Attribute2
Letter 22-23: Attribute3
Letter 24: Attribute4
Letter 25-31: Attribute5
and so on....
and so on...
Now I'm searching for a tool which transforms this text file along this rules to following xml
<entities>
<entity>
<attribute1>2017-01-03-10.11.12</attribute1>
<attribute2>010</attribute2>
<attribute3>00</attribute3>
<attribute4>B</attribute4>
<attribute5>H4_01</attribute5>
... and so on
</entity>
<entity>
<attribute1>2017-01-01-09.12.13</attribute1>
<attribute2>010</attribute2>
<attribute3>00</attribute3>
<attribute4>B</attribute4>
<attribute5>HAX4_01</attribute5>
... and so on
</entity>
<entity>
<attribute1>2017-01-01-10.12.13</attribute1>
<attribute2>010</attribute2>
<attribute3>00</attribute3>
<attribute4>B</attribute4>
<attribute5>MLH4_01</attribute5>
... and so on
</entity>
</entities>
The tool needs also to implement some simple logic, for example trimming Strings, if/else, Date format conversion.
First, I thought on using xslt - so the owner of this weird text file could produce the corresponding configuration file even on his own (that would be best!). But I often read that xslt is only for converting xml to other formats, not to convert plain text files to xml.
It should also be maintainable so a shell script using awk and sed would be very confusing.
Do yo know a tool which is more suitable than xslt?
A smart way to do this is to generate an XSLT stylesheet from a data description file that describes the input.
If the data description file has
<fields>
<field name="attribute1" start="1" length="18"/>
<field name="attribute2" start="19" length="2"/>
</fields>
then it's pretty easy to generate an XSLT 3.0 transformation which does
<xsl:template name="main">
<entities>
<xsl:for-each select="unparsed-text-lines('input.xml')">
<entity>
<attribute1>{substring(., 1, 18)}</attribute1>
<attribute2>{substring(., 1, 18)}</attribute2>
</entity>
</xsl:for-each>
</entities>
</xsl:template>
(and generating XSLT 2.0 is only very slightly more complex, but doing XSLT 1.0 is harder because you can't read a plain text file directly).
Implementing your "simple logic" is a bit trickier, but it wouldn't be hard to add an extra field to the data description:
<field name="attribute1" start="1" length="18" action="checkDate"/>
which causes the generated XSLT to be
<attribute1>{f:checkDate(substring(., 1, 18))}</attribute1>
invoking a function in the stylesheet such as
<xsl:function name="f:checkDate" as="xs:string">
<xsl:param name="in" as="xs:string"/>
<xsl:sequence select="if ($in castable as xs:date) then $in else error(...)"/>
</xsl:function>