Search code examples
powershellnewlineeol

Is there a Powershell command that will print a specified file's EOL characters?


I have four text files in the following directory that have varying EOL characters:

C:\Sandbox 1.txt, 2.txt, 3.txt, 4.txt

I would like to write a powershell script that will loop through all files in the directory and find the EOL characters that are being used for each file and print them into a new file named EOL.txt

Sample contents of EOL.txt:

1.txt UNIX(LF)
2.txt WINDOWS(CRLF)
3.txt WINDOWS(CRLF)
4.txt UNIX(LF)

I know to loop through files I will need something like the following, but I'm not sure how to read the file EOL:

Get-ChildItem "C:\Sandbox" -Filter *.txt | 
Foreach-Object {
}

OR

Get-Content "C:\Sandbox\*"  -EOL | Out-File -FilePath "C:\Sandbox\EOL.txt"
##note that EOL is not a valid Get-Content command

Solution

  • Try the following:

    Get-ChildItem C:\Sandbox\*.txt -Exclude EOL.txt |
      Get-Content -Raw |
        ForEach-Object {
          $newlines = [regex]::Matches($_, '\r?\n').Value | Select-Object -Unique
          $newLineDescr = 
            switch ($newlines.Count) {
              0 { 'N/A' }
              2 { 'MIXED' }
              default { ('UNIX(LF)', 'WINDOWS(CRLF)')[$newlines -eq "`r`n"] }
            }
          # Construct and output a custom object for the file at hand.
          [pscustomobject] @{
            Path          = $_.PSChildName
            NewlineFormat = $newLineDescr
          }
        } # | Out-File ... to save to a file - see comments below.
    

    The above outputs something like:

    FileName NewlineFormat
    -------- -------------
    1.txt    UNIX(LF)
    2.txt    WINDOWS(CRLF)
    3.txt    N/A
    4.txt    MIXED
    

    N/A means that no newlines are present, MIXED means that both CRLF and LF newlines are present.

    You can save the output:

    • directly in the for-display format shown above by appending a > redirection or piping (|) to Out-File, as in your question.

    • alternatively, using a structured text format better suited to programmatic processing, such CSV; e.g.: Export-Csv -NoTypeInformation -Encoding utf8 C:\Sandbox\EOL.txt

    Note:

    • Short of reading the raw bytes of a text file one by one or in batches, the only way to analyze the newline format is to read the file in full and search for newline sequences. Get-Content -Raw reads a given file in full.

    • [regex]::Matches($_, '\r?\n').Value extracts all newline sequences - whether CRLF or LF - from the file's content, and Select-Object -Unique reduces them to the set of distinct sequences.

    • ('UNIX(LF)', 'WINDOWS(CRLF)')[$newlines -eq "`r`n"] is a convenient, but somewhat obscure emulation of the following ternary conditional:

      • $newlines -eq "`r`n" ? 'WINDOWS(CRLF)' : 'UNIX(LF)', which could be used in PowerShell (Core) 7+ as-is, but, unfortunately isn't supported in Windows PowerShell.

      • The technique relies on a [bool] value getting coerced to an [int] value when used as an array index ($true -> 1, $false -> 0), thereby selecting the appropriate element from the input array.

      • If you don't mind the verbosity, you can use a regular if statement as an expression (i.e., you can assign its output directly to a variable: $foo = if ...), which works in both PowerShell editions:

        • if ($newlines -eq "`r`n") { 'WINDOWS(CRLF)' } else { 'UNIX(LF)' }

    Simpler alternative via WSL, if installed:

    WSL comes with the file utility, which analyzes the content of files and reports summary information, including newline formats.

    While you get no control over the output format, which invariably includes additional information, such as the file's character encoding, the command is much simpler:

    Set-Location C:\Sandbox
    wsl file *.txt
    

    Caveats:

    • This approach is fundamentally limited to files on local drives.
    • If changing to the target dir. is not an option, relative paths would need their \ instances translated to /, and full paths would need drive specs. such as C: translated to /mnt/c (lowercase!).

    Interpreting the output:

    • If the term line terminators (referring to newlines) is not mentioned in the output (for text files), Unix (LF) newlines only are implied.
    • Windows (CRLF) newlines only are implied if you see with CRLF line terminators
    • In case of a mix of LF and CRLF, you'll see with CRLF, LF line terminators
    • In the absence of newlines you'll see with no line terminators