Search code examples
regexvb.netfilehelpers

Regular Expression to manipulate all single double quotes with in a field with two double quotes in FileHelpers BeforeReadRecords event


In Filehelpers BeforeReadRecords Event, we need Regular Expression to manipulate all single "double quotes" with in a field with two double quotes.

CSV Content:

"Mine"s Minesweeper", "Yours"s Minesweeper", "Uncle Sam"s Minesweeper"
"Mine"s Minesweeper2", "Yours"s Minesweeper2", "Unknown Minesweeper3"

Need helps to create a vb.net Regular Expression to replace the all inner double quotes. Currently we are using below approrach?

Dim engine As New FileHelperEngine(cb.CreateRecordClass())
                AddHandler engine.BeforeReadRecord, AddressOf BeforeReadRecordHandler

Event Code

    Sub BeforeReadRecordHandler(engine As EngineBase, e As FileHelpers.Events.BeforeReadEventArgs(Of Object))
        Try
            Dim newLine As String = ""
            Dim sep As String = ""
            Dim arr() As String = e.RecordLine.Split(",")
            arr.AsParallel().ForAll(Sub(x)
                                        If x.Length > 1 Then
                                            newLine = String.Format("{0}{1}{2}", newLine, sep, x.Substring(1, IIf(x.Length <= 2, 0, x.Length - 2)).Replace("""", """"""))


                                        Else
                                            newLine = String.Format("{0}{1}{2}", newLine, sep, x)
                                        End If

                                        sep = ","
                                    End Sub)
            e.RecordLine = newLine
        Catch ex As Exception

        End Try
    End Sub

Trying to generate regular expression for ;

String.Format("{0}{1}{2}", newLine, sep, x.Substring(1, IIf(x.Length <= 2, 0, x.Length - 2)).Replace("""", """""")).

Output should be

CSV Content:

"Mine""s Minesweeper", "Yours""s Minesweeper", "Uncle Sam""s Minesweeper"
"Mine""s Minesweeper2", "Yours""s Minesweeper2", "Unknown Minesweeper3"

Solution

  • .Net supports arbitrary length lookbehind, so you can use the following;

    (?<!(^|,)\s*)"(?!\s*($|,))
    

    Use with

    Regexp.replace(input,(?<!(^|,)\s*)"(?!\s*($|,)),"""""",RegexOptions.Multiline)
    

    This matches any " not preceded by the start of line or a comma, and not suceeded by end of string or comma, both conditions ignoring an arbitrary amount of whitespace.

    It will misbehave if an entry in the CSV is not bracketed by quotes, or if a comma occurs in the text of an entry.