Search code examples
excelxmlvbaremovechild

Remove (child) node from XML DOM object using VBA (Excel)


I am creating quite complex XML files using a template, replacing special search strings with values which can be entered in an Excel sheet, and then storing the xml-file.

 Dim strInpPath As String
 Dim strOutpPath As String
 
 Dim fso
 Dim f
 Dim oDomRd As Object, oNode As Object, i As Long, oAtt As Object, oGroup As Object, oDomWr As Object
 Dim oTest As Object
 
 
 strInpPath = ActiveWorkbook.ActiveSheet.Cells(3, 4).Value
 strOutputPath = ActiveWorkbook.ActiveSheet.Cells(4, 4).Value
 

 Set oDomRd = CreateObject("MSXML2.DOMDocument")
 oDomRd.Load strInpPath
 Set oDomWr = CreateObject("MSXML2.DOMDocument")
 
 Set fso = CreateObject("Scripting.FileSystemObject")
 Set f = fso.OpenTextFile(strOutputPath, 2, True)

 Set oGroup = oDomRd.SelectNodes("/")
 Set oNode = oGroup.NextNode
 If Not (oNode Is Nothing) Then
    strout = oNode.XML
    strout = ScanTable("_S_AND_R_TABLE_1", strout)
    oDomRd.LoadXML (strout)
    Set oGroup = oDomRd.SelectNodes("/")
    Set oNode = oGroup.NextNode
    
    If oNode.HasChildNodes() Then
        Set oLists = oNode.DocumentElement
        Run RemoveOptionalEmptyTags(oLists)
    End If
    strout = oNode.XML
    f.write (strout)
 Else
     strout = "001 error reading file"
 End If
 MsgBox strout
 
End Function

Some of the field values are not mandatory so they can be left empty. In this case, the first procedure (scantable) enters "##REMOVE##" as value. In the second step, I want to step through the entire DOMObject and remove the nodes having the value "##REMOVE##"

for this second step I created a procedure:

Public Function RemoveOptionalEmptyTags(ByRef oLists)

    For Each listnode In oLists.ChildNodes
        If listnode.HasChildNodes() Then
            Run RemoveOptionalEmptyTags(listnode) 
        Else
            lcBasename = listnode.ParentNode.BaseName
            lcText = listnode.Text
            If lcText = "##REMOVE##" Then
                listnode.ParentNode.RemoveChild listnode
                Exit For
            End If
        End If
    Next listnode

End Function

This works pretty fine, the only problem is, that the node is not removed, it only is empty ():

    <Cdtr>
        <Nm>Name Creditor</Nm>
        <PstlAdr>
            <Ctry>DE</Ctry>
            <AdrLine>Street</AdrLine>
            <AdrLine/>
        </PstlAdr>
    </Cdtr>

now the question: How can I completely REMOVE the node, so it would look like this (the second is gone):

    <Cdtr>
        <Nm>Name Creditor</Nm>
        <PstlAdr>
            <Ctry>DE</Ctry>
            <AdrLine>Street</AdrLine>
        </PstlAdr>
    </Cdtr>

Solution

  • Basically the RemoveChild syntax is correct:

    {NodeToDelete}.ParentNode.RemoveChild {NodeToDelete}

    But let's repeat the xml structure and note that each text node (if existant) is regarded as a ChildNode of its parent (i.e. one hierarchy level deeper).

    <Cdtr>                                   <!-- 0 documentElement                      -->
        <Nm>Name Creditor</Nm>               <!-- 1 ChildNode of Nm = 'Name Creditor'    -->
        <PstlAdr>                            <!-- 1 listNode.ParentNode.ParentNode       -->
            <Ctry>DE</Ctry>                  <!--   2 ChildNode of Ctry = 'DE'           -->
            <AdrLine>Street</AdrLine>        <!--   2 ChildNode of AdrLine[1] = 'Street' -->                   
            <AdrLine>                        <!--   2 listNode.ParentNode to be removed  -->
                <!-- NODETEXT ##REMOVE## --> <!--     3 ChildNode of AdrLine[2]          -->
            </AdrLine>
            </PstlAdr>
    </Cdtr>
    

    Diving down to bottom in xml hierarchy (assuming text values) via

        listnode.ParentNode.RemoveChild listnode
    

    you are deleting the textual ChildNode of AdrLine[2] (level 3) which is the string "##REMOVE##", but not it container node AdrLine[2] (level 2). Therefore you are deleting only the dummy text.

    Following your logic in function RemoveOptionalEmptyTags() as close as possible you'd have to code instead:

        listNode.ParentNode.ParentNode.RemoveChild listNode.ParentNode
    

    addressing PstlAdr (=level 1) executing a deletion of its ChildNode AdrLine[2] (i.e. at level 2) which automatically includes deletion of the dummy string "##REMOVE" at level 3.

    Related links:

    XML Parse via VBA

    Obtain atrribute names from xml using VBA