Search code examples
rxmlreadxmlparsexml

How to read XML files with initial tags in R


I have several XML files which are missing the initial tag. For example, this is the proper formatted file:-

<?xml version="1.0"?>
<UDI>
<Test_Equipment_Number>3300061-01</Test_Equipment_Number>
<Test_SW_Number>3300062</Test_SW_Number>
<Test_SW_Version>2.1</Test_SW_Version>
<GTIN>(01)00884838088597</GTIN>
<LOT></LOT>
<Date_of_Mfg>(11)20190322</Date_of_Mfg>
<Device_SN>(21)1160001242</Device_SN>
<Material_Number>(96)300001287651</Material_Number>
<PCBA_WO_and_SN>00190311-0001242</PCBA_WO_and_SN>
<FW_Version>06</FW_Version>
<Model>324PHB</Model>
</UDI>

And this is the file with missing initial tag:-

<Test_Equipment_Number>3300011-01</Test_Equipment_Number>
<Test_SW_Number>3300012</Test_SW_Number>
<Test_SW_Version>5.1</Test_SW_Version>
<GTIN>(01)00884838085497</GTIN>
<LOT></LOT>
<Date_of_Mfg>(11)20190411</Date_of_Mfg>
<Device_SN>(21)1120104548</Device_SN>
<Material_Number>(96)300000267981</Material_Number>
<PCBA_WO_and_SN>000143-00000793</PCBA_WO_and_SN>
<FW_Version>V01.0001</FW_Version>
<Model>7000PHW</Model>

How could I read the file with missing initial tag in R Programming Language ?


Solution

  • One option would be to parse the xml fragment by specifying a top node to be added:

    # install.packages('XML')
    library(XML)
    
    fragment <- 
    '<Test_Equipment_Number>3300011-01</Test_Equipment_Number>
    <Test_SW_Number>3300012</Test_SW_Number>
    <Test_SW_Version>5.1</Test_SW_Version>
    <GTIN>(01)00884838085497</GTIN>
    <LOT></LOT>
    <Date_of_Mfg>(11)20190411</Date_of_Mfg>
    <Device_SN>(21)1120104548</Device_SN>
    <Material_Number>(96)300000267981</Material_Number>
    <PCBA_WO_and_SN>000143-00000793</PCBA_WO_and_SN>
    <FW_Version>V01.0001</FW_Version>
    <Model>7000PHW</Model>'
    
    XML::parseXMLAndAdd(fragment, top = 'content')
    #> <content>
    #>   <Test_Equipment_Number>3300011-01</Test_Equipment_Number>
    #>   <Test_SW_Number>3300012</Test_SW_Number>
    #>   <Test_SW_Version>5.1</Test_SW_Version>
    #>   <GTIN>(01)00884838085497</GTIN>
    #>   <LOT/>
    #>   <Date_of_Mfg>(11)20190411</Date_of_Mfg>
    #>   <Device_SN>(21)1120104548</Device_SN>
    #>   <Material_Number>(96)300000267981</Material_Number>
    #>   <PCBA_WO_and_SN>000143-00000793</PCBA_WO_and_SN>
    #>   <FW_Version>V01.0001</FW_Version>
    #>   <Model>7000PHW</Model>
    #> </content>