So I have a set of pipe delimited inputs which are something like this:
"787291 | 3224325523" | 37826427 | 2482472 | "46284729|46246" | 24682 | 82524 | 6846419 | 68247
and I am converting them to comma delimited using the code given below:
Dim line As String
Dim fields As String()
Using sw As New StreamWriter("c:\test\output.txt")
Using tfp As New FileIO.TextFieldParser("c:\test\test.txt")
tfp.TextFieldType = FileIO.FieldType.Delimited
tfp.Delimiters = New String() {"|"}
tfp.HasFieldsEnclosedInQuotes = True
While Not tfp.EndOfData
fields = tfp.ReadFields
line = String.Join(",", fields)
sw.WriteLine(line)
End While
End Using
End Using
So far so good. It only considers the delimiters that are present outside the quotes and changes them to the comma delimiter. But trouble starts when I have input with a stray quotation like below:
"787291 | 3224325523" | 37826427 | 2482472 | "46284729|46246" | 24682 | "82524 | 6846419 | 68247
Here the code gives
MalformeLineExcpetion
Which I realize is due to the stray quotation in my input and since i am like a total noob in RegEx so i am not able to use it here(or I am incapable of). If anyone has any idea, it would be much appreciated.
Here is the coded procedure described in the comments:
TextFieldParser
to perform the parsing of the correct input Join()
the input parts created by TextFieldParser
using ,
as separator I'm using Wiktor Stribiżew Regex pattern: it looks like it should work given the description of the problem.
Note:
Of course I don't know whether a specific Encoding should be used.
Here, the Encoding is the default UTF-8 no-BOM
, in and out.
"FaultyInput.txt"
is the corrupted source file.
"FixedInput.txt"
is the file containing the input lines fixed (hopefully) by the Regex. You could also use a MemoryStream
.
"FixedOutput.txt"
is the final CSV
file, containing comma separated fields and the correct values.
These files are all read/written in the executable startup path.
Dim input As List(Of String) = File.ReadAllLines("FaultyInput.txt").ToList()
For line As Integer = 0 To input.Count - 1
input(line) = Regex.Replace(input(line), "(""\b.*?\b"")|""", "$1")
Next
File.WriteAllLines("FixedInput.txt", input)
Dim output As List(Of String) = New List(Of String)
Using tfp As New FileIO.TextFieldParser("FixedInput.txt")
tfp.TextFieldType = FileIO.FieldType.Delimited
tfp.Delimiters = New String() {"|"}
tfp.HasFieldsEnclosedInQuotes = True
While Not tfp.EndOfData
Dim fields As String() = tfp.ReadFields
output.Add(String.Join(",", fields))
End While
End Using
File.WriteAllLines("FixedOutput.txt", output)
'Eventually...
'File.Delete("FixedInput.txt")