Search code examples
rxmlxpathxml-parsingparsexml

Parse XML using R having namespaces


The below is the xml response i got from the sharepoint I am trying to parse the data and get details in the below format

Output Needed

title port space    datecreat               id
test  8080 100.000 2017-04-21 17:29:23      1
apple  8700 108.000 2017-04-21 18:29:23     2

Input Received

<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <soap:Body>
        <GetListItemsResponse xmlns="http://schemas.microsoft.com/sharepoint/soap/">
            <GetListItemsResult>
                <listitems xmlns:s='uuid:SBDSHDSH-DSJHD' xmlns:dt='uuid:CSDSJHA-DGGD' xmlns:rs='urn:schemas-microsoft-com:rowset' xmlns:z='#RowsetSchema'
                    <rs:data ItemCount="2">
                        <z:row title="test" port="8080" space='100.000' datecreat='2017-04-21 17:29:23' id='1' />
                        <z:row title="apple" port="8700" space='108.000' datecreat='2017-04-21 17:29:23' id='2' />
                    </rs:data>
                </listitems>
            </GetListItemsResult>
        </GetListItemsResponse>
    </soap:Body>
</soap:Envelope>

I am new to R and tried few and none worked .The namespaces and z:row is unable to be detected.


Solution

  • Consider registering the z namespace prefix and use XML's internal variable xmlAttrsToDataframe using the triple colon operator:

    library(XML)
    
    txt='<?xml version="1.0" encoding="utf-8"?>
    <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
     <soap:Body>
      <GetListItemsResponse xmlns="http://schemas.microsoft.com/sharepoint/soap/">
        <GetListItemsResult>
          <listitems xmlns:s=\'uuid:SBDSHDSH-DSJHD\' xmlns:dt=\'uuid:CSDSJHA-DGGD\' xmlns:rs=\'urn:schemas-microsoft-com:rowset\' xmlns:z=\'#RowsetSchema\'>
            <rs:data ItemCount="2">
              <z:row title="test" port="8080" space=\'100.000\' datecreat=\'2017-04-21 17:29:23\' id=\'1\' />
              <z:row title="apple" port="8700" space=\'108.000\' datecreat=\'2017-04-21 17:29:23\' id=\'2\' />
            </rs:data>
          </listitems>
        </GetListItemsResult>
      </GetListItemsResponse>
     </soap:Body>
    </soap:Envelope>'
    
    doc <- xmlParse(txt)
    
    namespaces <- c(z="#RowsetSchema")
    df <- XML:::xmlAttrsToDataFrame(getNodeSet(doc, path='//z:row', namespaces))
    
    df
    #   title port   space           datecreat id
    # 1  test 8080 100.000 2017-04-21 17:29:23  1
    # 2 apple 8700 108.000 2017-04-21 17:29:23  2