Search code examples
powershellcountdelimiterline-numbers

Read a file, count delimiters and output line number with mismatched delimiter


I have a file named: test_file.txt. The second line has 4 pipe delimiters and all other lines except 2nd line has 3 pipe delimiters. I just want to output line 2 since it has one more delimiter than other lines.

$colCnt = "C:\test.txt"
[int]$LastSplitCount = $Null
Get-Content $colCnt | ?{$_} | Select -Skip 1 | %{if($LastSplitCount -and !

($_.split("|").Count -eq $LastSplitCount))

{"Process stopped at line number $($_.psobject.Properties.value[5]) for column count mis-match.";break}

elseif(!$LastSplitCount){$LastSplitCount = $_.split("|").Count}}

Solution

  • If your text file looks anything like this:

    blah|using|three|delimiters
    blah|using|four |delimiter |characters
    blah|using|three|delimiters
    blah|using|four |delimiter |characters
    blah|using two  |delimiters
    

    The the following code should output the lines with more (or less) than 3 | delimiters:

    $line = 0
    switch -Regex -File "C:\test.txt" {
        '^(?:[^|]*\|){3}[^|]*$' { $line++ }   # this line is OK, just increase the line counter
        default { "Bad delimiter count in line {0}: '{1}'" -f ++$line, $_ }
    }
    

    Output:

    Bad delimiter count in line 2: 'blah|using|four |delimiter |characters'
    Bad delimiter count in line 4: 'blah|using|four |delimiter |characters'
    Bad delimiter count in line 5: 'blah|using two  |delimiters'
    

    Regex details:

    ^           Assert position at the beginning of the string
    (?:         Match the regular expression below
       [^|]     Match any character that is NOT a “|”
          *     Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
       \|       Match the character “|” literally
    ){3}        Exactly 3 times
    [^|]        Match any character that is NOT a “|”
       *        Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
    $           Assert position at the end of the string (or before the line break at the end of the string, if any)