I have several CSV files to combine in one table (files have the same structure), but the files structure is f**ed enough to be problematic.
The first row is ordinary, just headers split by a comma:
Account,Description,Entity,Risk,...
but then the rows with actual data are starting and ending with doublequote ", columns are separated by a comma, but people (full name) has two double-quotes at beginning and end. I understand that it's an escape character to keep the name in one column, but one would be enough.
"1625110,To be Invoiced,587,Normal,""Doe, John"",..."
So what I need to do and don't know how is to remove " from the beginning and end of every row with data and replace ""
with "
in every line with data.
I need to do it in Power Query because there will be more of similar CSV files over time and I don't want to clean them manually.
Any ideas?
I was trying with simple:
= Table.AddColumn(#"Removed Other Columns", "Custom", each Csv.Document(
[Content],
[
Delimiter = ",",
QuoteStyle = QuoteStyle.Csv
]
))
Try loading to a single column first, replace values to remove extra quotes, and then split by ","
.
Here's what that looks like for loading a single file:
let
Source = Csv.Document(File.Contents("filepath\file.csv"),[Delimiter="#(tab)"]),
ReplaceQuotes = Table.ReplaceValue(Source,"""""","""",Replacer.ReplaceText,{"Column1"}),
SplitIntoColumns = Table.SplitColumn(ReplaceQuotes, "Column1", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv)),
#"Promoted Headers" = Table.PromoteHeaders(SplitIntoColumns, [PromoteAllScalars=true])
in
#"Promoted Headers"
I used the tab delimiter to keep it from splitting in the first step.