Search code examples
pythonpython-3.xpowershellstdoutstdin

Python: Reading and Writing to a binary file in Powershell from stdin


I have the following two programs written in Python

# cat.py
import sys

filename = sys.argv[1]

with open(filename, "rb") as f:
    while c := f.read(1024 * 1024):
        sys.stdout.buffer.write(c)

This program reads a file and outputs it as a binary to stdout.

The following program is meant to read the data and print it as bytes.

import sys
import io
if __name__ == '__main__':
    print(sys.stdin.buffer.read(io.DEFAULT_BUFFER_SIZE))

However I do not get the file contents in this case. If I run this under Linux I do get the exact contents however if I run this in windows I do not:

python cat.py .\inputs\input.bin | python main.py

Output on Windows (running under pwsh.exe):

0x3
0xc2
0xb7
0x55
0x12
0x20
0x66
0x67
0x50
0xc3
0x9e
0xc2
0xbd
0xd
0xa

Output on Linux (This is correct):

0x3
0xfa
0x55
0x12
0x20
0x66
0x67
0x50
0xe8
0xab

Any ideas why this may be the case? Is it newline endings or something like that?

Also, in cat.py if I write to a file rather than stdout I do get the correct contents written to the file.


Update:

Okay, I have narrowed it down to it being a powershell issue. If i run this in cmd.exe I do not have any issues, however, if I run it under powershell I do.


Solution

  • It is likely that there are different encodings set up for both command lines which can results in a different data streams.

    Unfortunately, even if you read in from stdin as binary, it has to go through the commandline and there typically is a system-wide encoding setting that affects it.

    There is an answer that should help resolving this issue.