Q: I want to convert excel to csv but the excel contains multiline data in a cell along with ";" as delimiter. The desired output is a csv file with "|" as separator for columns but "," as separator for data within same cell. I cannot use excel hence I am using the module "ImportExcel".
Input Excel Format (attached file at end of Question) :-
Row1# c1# c2# c3# c4#
Row2#
Row3#
Row4# HEADER1 HEADER2 HEADER3
Row5# 04.05 test & t test TEST
Row5# 04.06.01 test test TEST
Row5# 04.07.01 test test
Row5# 04.10.02 test & t
Row# represents the row of the excel and c# represent the column while HEADER# represent the title of the columns. As can be seen, ROW 1-3 and first column of the excel are empty. The text of second and fourth column is multiline but in the same cell(same row Row5).
I tried to Import the excel then for each object and replace the new line character with "," finally export it to csv using(export-csv) with "|" as delimiter.
The script I made is as follows :-
$test = Import-excel -NoHeader -StartRow(4) test.xlsx
$test | foreach-object {
if($_.HEADER1) {
$_.HEADER1 = $_.HEADER1.replace("\n",", ")
$_.HEADER1 = $_.HEADER1.replace("&","& ")
}
}
$test | Export-csv new-test.csv -Delimiter '|' -NoTypeInformation
Output CSV FILE I GET :-
Row1# c1# c2# c3#
Row2# HEADER1|HEADER2|HEADER3
Row3# 04.05 test & t
Row4# 04.06.01 test test
Row5# 04.07.01 test test
Row6# 04.10.02 test & t
Row7# test|TEST
Row8# TEST
As can be seen, the text after the delimeter ";" in the input goes to another column and the multiline is still parsed into separate rows. I want all this multiline to be merged into one with "," as separator along with the text after the "';" delimeter.
Desired CSV FILE :-
Row1# c1# c2#
Row2# HEADER1|HEADER2|HEADER3
Row3# 04.05 test & t,04.06.01 test test,04.07.01 test test,04.10.02 test & t|test|TEST,TEST
I have tried a lot of questions from stackoverflow related to this but for some reason none of the existing solution seems to be working for me.
Attached is the sample Excel file I am trying to convert. test.xlsx
The problem with your code is that you specify the -NoHeader
parameter.
and you have to specify `n
instead of \n
as a newline character in Replace()
method.
$test = Import-Excel test.xlsx -StartRow 4
$test | ForEach-Object {
if($_.HEADER1) {
$_.HEADER1 = $_.HEADER1.Replace("`n", ", ")
$_.HEADER1 = $_.HEADER1.Replace("&", "&")
}
if($_.HEADER3) {
$_.HEADER3 = $_.HEADER3.Replace("`n", ", ")
}
}
$test | Export-Csv new-test.csv -Delimiter "|" -NoTypeInformation
the output:
"HEADER1"|"HEADER2"|"HEADER3"
"04.05 test & t, 04.06.01 test test, 04.10.02 test & t, 04.07.01 test test"|"test"|"TEST, TEST"
It is useful to use the $obj.psobject.Properties
property to replace the values of all properties.
# import
$data = Import-Excel test.xlsx -StartRow 4
# replace the values of all properties
$data | ForEach-Object { $_.psobject.Properties } | Where-Object Value -is string |
ForEach-Object { $_.Value = $_.Value.Replace("`n",", ").Replace("&", "&") }
# export
$data | Export-Csv new-test.csv -Delimiter "|" -NoTypeInformation
If you want to insert empty rows and columns, add them after converting by ConvertTo-CSV
.
$data | ConvertTo-Csv -Delimiter "|" -NoTypeInformation | ForEach-Object { "" } { "|" + $_ } | Out-File new-test.csv
the output:
|"HEADER1"|"HEADER2"|"HEADER3"
|"04.05 test & t, 04.06.01 test test, 04.10.02 test & t, 04.07.01 test test"|"test"|"TEST, TEST"
If the headers are duplicated, the -NoHeader
parameter is required. And remove the automatically attached headers before saving.
$data = Import-Excel test.xlsx -StartRow 4 -StartColumn 2 -NoHeader
$data | foreach { $_.psobject.Properties } | where Value -is string | foreach { $_.Value = $_.Value.Replace("`n",", ").Replace("&", "&") }
$data | ConvertTo-Csv -Delimiter "|" -NoTypeInformation | Select-Object -Skip 1 | Out-File new-test.csv