Search code examples
excelpowershellnewlineline-breaks

Converting line breaks to commas in excel sheet using Powershell


I am working on powershell code to check each column in all rows in the excel spreadsheet and check if any line breaks are exist and convert them into commas. I have tried with below code but it is not updating the line breaks into commas.

Code:

# Load the Excel file
$path = "./NSG_Rules.xlsx"
$exceldata = Import-Excel -Path $path

# Iterate through each cell and replace line breaks with commas if present

`foreach ($row in $exceldata) {     foreach ($property in $row.psobject.Properties) {         # Check if the cell content has line breaks and replace them with commas         if ($property.Value -match "`r`n") {             $property.Value = $property.Value -replace "`r`n", ","         }     } }`

# Save the updated workbook.

$rows | Export-Excel $exceldata

My requirement is, I have excel spread sheet and it is having many rows and columns. the powershell code needs to check each cell in the excel spread sheet and if any line breaks(refer screenshot) found then it needs to convert it into commas and update the excel sheet.

In the uploaded image screenshot, the highlighted red color cell is having the line break. so it needs to convert into single line with comma like below: I have tried below code but it is not updating the excel spread sheet by removing the line breaks and converting them into comma.

Screenshots for reference: Present format: enter image description here Required outputformat: enter image description here


Solution

  • As mclayton points out:

    • you're not dealing with newlines (line breaks) inside cells (if you were, this answer would help you).

    • you're dealing with "overflow rows", where what is conceptually an additional cell value is actually contained in a separate row, i.e. in an additional cell below the cell of interest (with the other cells in a row of interest merged with their empty counterparts below).


    To append these "overflow values" to their corresponding main-row values, you can use the following approach:

    • Process rows conditionally, depending on whether they are "main" or "overflow" rows.

    • Detect the latter by whether their first cell is empty.

    • Among the remaining cells, append any non-empty values to their main-row counterparts.

    Important:

    • In order to be able to save changes to a workbook, it mustn't currently be open in Excel. You can ensure this manually or - assuming you're comfortable doing so - can automatically forcefully terminate any running Excel processes to ensure that saving succeeds, by adding the -KillExcel switch to the Export-Excel calls below.

    • The solution below uses Export-Excel's -ClearSheet switch to re-create the worksheet's data, which has two implications:

      • All formatting is lost; see the next section for a - more cumbersome - solution that preserves formatting.

      • If the workbook happened to have multiple sheets, the re-created sheet is placed last (arguably, this shouldn't happen); the -MoveToStart switch used below compensates for that.

    • Not using -ClearSheet isn't an option (without substantial additional effort that may require use of the Excel.Application COM Automation server instead), because:

      • Fundamentally, Export-Excel doesn't replace a worksheet's content, but simply updates it, with any preexisting formatting retained as-is, as well as any content unaffected by the update.

      • While you could - somewhat - compensate for the reduction in the count of rows due to the described processing - by appending additional, empty, rows that blank out the extra original ones, which requires passing objects that contain empty strings for each target column - merged cells present in the original worksheet would then effectively hide values. See the next section for a solution that requires more work.

    $path = "./NSG_Rules.xlsx"
    
    $rows = Import-Excel $path
    
    # Get the column (property) names, split into the first and all remaining ones.
    $firstPropName, $remainingPropNames = $rows[0].psobject.Properties.Name
    
    # Process all rows, merging the cells of "overflow" rows with the
    # corresponding cells of the main rows.
    $modifiedRows = 
      & {
        foreach ($row in $rows) {
          if ($null -eq $row.$firstPropName) { # an "overflow" row
            # Process all remaining columns and, for any that contain a value, append that
            # value to the most recent "main" row's corresponding column values.
            foreach ($propName in $remainingPropNames) {
              if ($null -ne ($val = $row.$propName)) {
                # Note: For appending to succeed, the main row's
                #       value must be converted to a string, if necessary.
                $prevRow.$propName = [string] $prevRow.$propName + ',' + $val
              }
            }
          } else { # a "main" row - possibly complete by itself, possibly complemented by subsequent rows.
            if ($prevRow) { $prevRow } # Output, if a previous row is now by definition complete.
            # Save this row - its column values may need to be appended to in the next iteration(s).
            $prevRow = $row
          }
        }
        # Output the last row, if necessary.
        if ($prevRow) { $prevRow }
      }
    
    # Export the modified rows back to the workbook,
    # re-creating the sheet - without formatting - in the process.
    $modifiedRows | Export-Excel -ClearSheet -MoveToStart $path 
    

    Solution that preserves formatting:

    Such a solution requires a two-pass approach:

    • First, update the cells in the "main" rows only; leave the "overflow" rows alone, but record their row indices in a list for later removal.

    • Then, via Export-Excel -PassThru - which is equivalent to calling Export-Excel (to write the updated values) followed by Open-ExcelPackage - obtain a reference to the Excel package (workbook) for advanced operations, which allows removal (deletion) of rows directly from the spreadsheet.

    $path = "./NSG_Rules.xlsx"
    $rows = Import-Excel $path
    
    # Get the column (property) names, split into the first and all remaining ones.
    $firstPropName, $remainingPropNames = $rows[0].psobject.Properties.Name
    
    $rowsToDeleteIndices = [System.Collections.Generic.List[int]] @()
    
    # Process all rows, merging the cells of "overflow" rows with the
    # corresponding cells of the main rows.
    $rowIndex = 1 # Start with 1, to account for the header row. 
    foreach ($row in $rows) {
      ++$rowIndex
      if ($null -eq $row.$firstPropName) {
        # an "overflow" row
        # Process all remaining columns and, for any that contain a value, append that
        # value to the most recent "main" row's corresponding column values.
        foreach ($propName in $remainingPropNames) {
          if ($null -ne ($val = $row.$propName)) {
            # Note: For appending to succeed, the main row's
            #       value must be converted to a string, if necessary.
            $prevMainRow.$propName = [string] $prevMainRow.$propName + ',' + $val
          }
        }
        # Leave the overflow rows alone for now, but record 
        # their (1-based) indices for later deletion.
        $rowsToDeleteIndices.Add($rowIndex) 
      }
      else {
        # a "main" row - possibly complete by itself, possibly complemented by subsequent rows.
        # Save this row - its column values may need to be appended to in the next iteration(s).
        $prevMainRow = $row
      }
    }
    
    # Export the modified values back to the worksheet, 
    # in place, with all formatting preserved.
    # Via -PassThru, obtain a reference to the package (workbook),
    # for advanced operations such as row deletion.
    $pkg = $rows | Export-Excel $path -PassThru
    
    # Obtain a reference to the first worksheet again
    # (the same one implicitly targeted by Import-Excel / Export-Excel).
    $ws = $pkg.Workbook.Worksheets[1]
    
    # Now delete the no longer needed overflow rows from it.
    $i = 0
    $rowsToDeleteIndices | ForEach-Object { $ws.DeleteRow($_ + $i--) }
    
    # Close and thereby implicitly save the package.
    Close-ExcelPackage $pkg