Search code examples
regexreplacevbscriptsubstitutionmetacharacters

How to Replace with Line Feed in VBScript RegEx


I am using VBScript and have a script that converts an xml to a text file.

I am trying to do a replacement to replace the string ###EntryEnd###\| to a LF character.

I tried \n and \x0a in the replacement pattern but they don't work. The only workaround I found is to use Chr(10) instead.

I was looking for an answer to this behavior by was not able to find it. Both \n and \x0a should work. Any advice?

Here is the code:

' Method to process the file
Private Function PrepFile(ByVal strInp)
    With New RegExp
        .Global = True
        .Pattern = "\|"
        strInp = .Replace(strInp, "")
        .Pattern = "<xmldoc .*?xml:lang=""([^""]+)"">"
        strInp = .Replace(strInp, "English|$1|Part Of Speech|Note|EngDef|Glossary Definition###EntryEnd###|")
        .Pattern = "<remove>.*?</remove>"
        strInp = .Replace(strInp, "")
        .Pattern = "(<tab/>|</para>)"
        strInp = .Replace(strInp, "|")
        .Pattern = "<[^>]*>"
        strInp = .Replace(strInp, "")
        .Pattern = "\n"
        strInp = .Replace(strInp, "")
        .Pattern = "###EntryEnd###\|"
        strInp = .Replace(strInp, chr(10))
    End With
    PrepFile = strInp
End Function

Sample file snippet:

<?xml version="1.0" encoding="UTF-8"?>
<xmldoc source="" type="TERMS" xml:lang="hu-HU">
<para id="13" name="Entry"><notrans><seg>School Administrator</seg><tab/></notrans><remove>___________</remove><seg>iskolavezető</seg></para>
<para id="14" name="Usage"><notrans><seg> </seg><tab/></notrans><remove>HASZNÁLAT:</remove><seg> </seg></para>
<para id="15" name="EntryText"><notrans><seg> </seg><tab/></notrans><remove>MEGHATÁROZÁS:</remove><seg> </seg></para>
<para id="16" name="Context"><remove>PÉLDA:</remove><remove><seg>Cathy Brown iskolavezető</seg></remove><notrans>###EntryEnd###</notrans></para>
<para id="17" name="Entry"><notrans><seg>School Resource Officer</seg><tab/></notrans><remove>___________</remove><seg>iskolarendőr</seg></para>
<para id="18" name="Usage"><notrans><seg> </seg><tab/></notrans><remove>HASZNÁLAT:</remove><seg> </seg></para>
<para id="19" name="EntryText"><notrans><seg>a law enforcement officer who is responsible for providing security and crime prevention services in schools in parts of the United States and Canada.|</seg><tab/></notrans><remove>MEGHATÁROZÁS:</remove><seg>rendőr, aki azért felelős, hogy az iskolákban biztonsági és bűnmegelőzési feladatokat lásson az Egyesült Államok és Kanada egyes területein.</seg></para>
<para id="20" name="Context"><remove>PÉLDA:</remove><remove><seg>Ocalai iskolarendőrök</seg></remove><notrans>###EntryEnd###</notrans></para>
</xmldoc>

Solution

  • In your question the "problem" (simply wrong assumption) can be found in

    • Both \n and \x0a should work

    The documentation of the Replace method doesn't state that the replacement string allows the usage of escape sequences except the $1, $2, ... references to the capture groups in the regular expression pattern.

    So, if the RegExp object does not provide this behaviour in the replacement string, and as the VBScript parser does not handle any escape sequences in strings except escaped doubled quotes, there is not any element handling the \n to line feed conversion.

    You can use the indicated escape sequences to represent nonprinting characters in the search pattern string but they are not seen as escape sequences in the replacement string.

    If you don't like the Chr(10) function call, you can use the available vbLf constant to refer to the line feed character

    strInp = .Replace(strInp, vbLf)