Search code examples
excelpowershellimport-from-excelimport-csvexport-csv

Powershell script to handle multi line data in excel cell and convert to csv using Import-excel or Import-csv


Q: I want to convert excel to csv but the excel contains multiline data in a cell along with ";" as delimiter. The desired output is a csv file with "|" as separator for columns but "," as separator for data within same cell. I cannot use excel hence I am using the module "ImportExcel".

Input Excel Format (attached file at end of Question) :-

Row1# c1# c2#                    c3#         c4#
Row2#
Row3#
Row4#     HEADER1                HEADER2     HEADER3
Row5#     04.05 test & t     test        TEST
Row5#     04.06.01 test test                 TEST
Row5#     04.07.01 test test 
Row5#     04.10.02 test & t

Row# represents the row of the excel and c# represent the column while HEADER# represent the title of the columns. As can be seen, ROW 1-3 and first column of the excel are empty. The text of second and fourth column is multiline but in the same cell(same row Row5).

I tried to Import the excel then for each object and replace the new line character with "," finally export it to csv using(export-csv) with "|" as delimiter.

The script I made is as follows :-

$test = Import-excel -NoHeader -StartRow(4) test.xlsx

$test | foreach-object {

if($_.HEADER1) {
$_.HEADER1 = $_.HEADER1.replace("\n",", ")
$_.HEADER1 = $_.HEADER1.replace("&","& ")
   }
}

$test | Export-csv new-test.csv -Delimiter '|' -NoTypeInformation

Output CSV FILE I GET :-

Row1# c1# c2#                       c3#
Row2#     HEADER1|HEADER2|HEADER3
Row3#     04.05 test &amp            t
Row4#     04.06.01 test test          
Row5#     04.07.01 test test 
Row6#     04.10.02 test &amp         t
Row7#     test|TEST
Row8#     TEST

As can be seen, the text after the delimeter ";" in the input goes to another column and the multiline is still parsed into separate rows. I want all this multiline to be merged into one with "," as separator along with the text after the "';" delimeter.

Desired CSV FILE :-

Row1# c1# c2#                
Row2#     HEADER1|HEADER2|HEADER3
Row3#     04.05 test &  t,04.06.01 test test,04.07.01 test test,04.10.02 test & t|test|TEST,TEST

I have tried a lot of questions from stackoverflow related to this but for some reason none of the existing solution seems to be working for me.

Attached is the sample Excel file I am trying to convert. test.xlsx


Solution

  • The problem with your code is that you specify the -NoHeader parameter.
    and you have to specify `n instead of \n as a newline character in Replace() method.

    $test = Import-Excel test.xlsx -StartRow 4
    $test | ForEach-Object {
        if($_.HEADER1) {
            $_.HEADER1 = $_.HEADER1.Replace("`n", ", ")
            $_.HEADER1 = $_.HEADER1.Replace("&", "&")
        }
        if($_.HEADER3) {
            $_.HEADER3 = $_.HEADER3.Replace("`n", ", ")
        }
    }
    
    $test | Export-Csv new-test.csv -Delimiter "|" -NoTypeInformation
    

    the output:

    "HEADER1"|"HEADER2"|"HEADER3"
    "04.05 test & t, 04.06.01 test test, 04.10.02 test & t, 04.07.01 test test"|"test"|"TEST, TEST"
    

    It is useful to use the $obj.psobject.Properties property to replace the values of all properties.

    # import
    $data = Import-Excel test.xlsx -StartRow 4
    
    # replace the values of all properties
    $data | ForEach-Object { $_.psobject.Properties } | Where-Object Value -is string |
    ForEach-Object { $_.Value = $_.Value.Replace("`n",", ").Replace("&", "&") }
    
    # export
    $data | Export-Csv new-test.csv -Delimiter "|" -NoTypeInformation
    

    If you want to insert empty rows and columns, add them after converting by ConvertTo-CSV.

    $data | ConvertTo-Csv -Delimiter "|" -NoTypeInformation | ForEach-Object { "" } { "|" + $_ } | Out-File new-test.csv
    

    the output:

    
    |"HEADER1"|"HEADER2"|"HEADER3"
    |"04.05 test & t, 04.06.01 test test, 04.10.02 test & t, 04.07.01 test test"|"test"|"TEST, TEST"
    

    If the headers are duplicated, the -NoHeader parameter is required. And remove the automatically attached headers before saving.

    $data = Import-Excel test.xlsx -StartRow 4 -StartColumn 2 -NoHeader
    $data | foreach { $_.psobject.Properties } | where Value -is string | foreach { $_.Value = $_.Value.Replace("`n",", ").Replace("&", "&") }
    
    $data | ConvertTo-Csv -Delimiter "|" -NoTypeInformation | Select-Object -Skip 1 | Out-File new-test.csv