Search code examples
vb6rtf

How to split an RTF file into lines?


I am trying to split an RTF file into lines (in my code) and I am not quite getting it right, mostly because I am not really grokking the entirety of the RTF format. It seems that lines can be split by \par or \pard or \par\pard or any number of fun combinations.

I am looking for a piece of code that splits the file into lines in any language really.


Solution

  • I coded up a quick and dirty routine and it seems to work for pretty much anything I've been able to throw at it. It's in VB6, but easily translatable into anything else.

    Private Function ParseRTFIntoLines(ByVal strSource As String) As Collection
        Dim colReturn As Collection
        Dim lngPosStart As Long
        Dim strLine As String
        Dim sSplitters(1 To 4) As String
        Dim nIndex As Long
    
        ' return collection of lines '
    
        ' The lines can be split by the following '
        ' "\par"                                  '
        ' "\par "                                 '
        ' "\par\pard "                            '
    
        ' Add these splitters in order so that we do not miss '
        ' any possible split combos, for instance, "\par\pard" is added before "\par" '
        ' because if we look for "\par" first, we will miss "\par\pard" '
        sSplitters(1) = "\par \pard"
        sSplitters(2) = "\par\pard"
        sSplitters(3) = "\par "
        sSplitters(4) = "\par"
    
        Set colReturn = New Collection
    
        ' We have to find each variation '
        ' We will look for \par and then evaluate which type of separator is there '
    
        Do
            lngPosStart = InStr(1, strSource, "\par", vbTextCompare)
            If lngPosStart > 0 Then
                strLine = Left$(strSource, lngPosStart - 1)
    
                For nIndex = 1 To 4
                    If StrComp(sSplitters(nIndex), Mid$(strSource, lngPosStart, Len(sSplitters(nIndex))), vbTextCompare) = 0 Then
                        ' remove the 1st line from strSource '
                        strSource = Mid$(strSource, lngPosStart + Len(sSplitters(nIndex)))
    
                        ' add to collection '
                        colReturn.Add strLine
    
                        ' get out of here '
                        Exit For
                    End If
                Next
            End If
    
        Loop While lngPosStart > 0
    
        ' check to see whether there is a last line '
        If Len(strSource) > 0 Then colReturn.Add strSource
    
        Set ParseRTFIntoLines = colReturn
    End Function