Search code examples
htmlvbainternet-explorerms-wordformatted-text

Rendering text with HTML tags to Formatted text in a Word table using VBA


I have a word document with html tags that I need converted to formatted text. For example I would like <strong>Hello</strong> to display as Hello instead.

I've never used VBA before, but I've been trying to piece something together that would allow me to copy the html text from a specific table cell in Word, use IE to display the formatted version of that text, copy the formatted text from IE, and then paste it back into that same Word table cell. I think I've been able to figure out some of the code, but I don't think I'm referring to the table cells correctly. Can anyone help? This is what I have so far:

Dim Ie As Object

Set Ie = CreateObject("InternetExplorer.Application")

With Ie
    .Visible = False

    .Navigate "about:blank"

    .Document.body.InnerHTML = ActiveDocument.Tables(1).Cell(2, 2)
    
    .Document.execCommand "SelectAll"
    
    .Document.execCommand "Copy"
    
    ActiveDocument.Paste Destination = ActiveDocument.Tables(1).Cell(2, 2)

    .Quit
End With
End Sub

Solution

  • You need two different appraoches for the two uses of .cell(2,2).

    To get text from the cell you need to amend the first line to read

    .Document.body.InnerHTML = ActiveDocument.Tables(1).Cell(2, 2).range.text  
    

    In the second case your terminology is incorrect. It should read

    ActiveDocument.Tables(1).Cell(2, 2).range.paste
    

    You can get help on individual keywords/properties quite easily. In the VBA IDE just place your cursor on a keyword/property and press F1. You will be taken to the MS help page for the keyword/property. Sometimes you will have an additional selection step when there is more than one alternative.

    You should also be aware that the property .cell(row,column) is prone to failure as it relies on their being no merged cells in the Table. A more robust approach is to use the .cells(index) property.

    It may be that you can take an alternative appraoch and use a wildcard search to find the tag and then replace the portion you need whilst applying a suitable linked style (You won't be able to use paragraph styles because you will be trying to format only part of a paragraph and character styles don't seem to work with find/replace).

    An example of such code that removes HTML tags and formats the remaining text is below

    Option Explicit
    
    Sub replaceHTML_WithFormattedText()
    
    ' a comma seperated list of HTML tags
    Const myTagsList                          As String = "strong,small,b,i,em"
    
    ' a list of linked styles chosen or designed for each tag
    ' Paragraph  styles cannot be used as we are replacing only part of a paragraph
    ' Character styles just don't seem to work
    ' The linked styles below were just chosen from the default Word styles as an example
    Const myStylesList                        As String = "Heading 1,Heading 9,Comment Subject,Intense Quote,Message Header"
    
    ' <, > and / are special characters therefore need escaping with '\' to get the actual character
    Const myFindTag                           As String = "(\<Tag\>)(*)(\<\/Tag\>)"
    Const myReplaceStr                        As String = "\2"
    
    Dim myTagsHTML()                        As String
    Dim myTagsStyles()                      As String
    Dim myIndex                             As Long
    
        myTagsHTML = Split(myTagsList, ",")
        myTagsStyles = Split(myStylesList, ",")
    
        If UBound(myTagsHTML) <> UBound(myTagsStyles) Then
            MsgBox "Different number of tags and Styles", vbOKOnly
            Exit Sub
    
        End If
    
        For myIndex = 0 To UBound(myTagsHTML)
    
            With ActiveDocument.StoryRanges(wdMainTextStory).Find
                .ClearFormatting
                .Format = True
                .Text = Replace(myFindTag, "Tag", Trim(myTagsHTML(myIndex)))
                .MatchWildcards = True
                .Replacement.Text = myReplaceStr
                .Replacement.Style = myTagsStyles(myIndex)
                .Execute Replace:=wdReplaceAll
    
            End With
    
        Next
    
    End Sub