Search code examples
pythonpython-3.xpowershellpipepowershell-3.0

Can you pipe non-text data from a python script via PowerShell?


I am fairly familiar with bash and know how to some basic scripting stuff involving pipes, and use them as a 'back end' to run python scripts in sequence.

However, for a new project I've been tasked with I can only use PowerShell. I've found that I can rewrite my previous shell scripts fine, buta I hear that you can pipe non-text data in PowerShell too.

My question is:

Is it possible to pipe non-text output (primarily a pandas dataframe) from a python script into another python script via PowerShell?

Something similar to:

script1.py | script2.py

If so, what are the logistics in regards to the python script? I.E can you still return to sys.stdout?

EDIT:

To better explain to usecase to be in line with the comments I've received.

I have two python scripts, test1.py:

#test1.py
import pandas as pd
import sys


def main():
    columns = ['A', 'B', 'C']
    data = [
        ['hello', 0,  3.14],
        ['world', 1,  2.71],
        ['foo',   2,  0.577],
        ['bar',   3,  1.61]

    ]

    df = pd.DataFrame(data, columns=columns)
    return df


if __name__ == "__main__":
    main().to_csv(sys.stdout, index_label=False)

and test2.py:

#test2.py
import pandas as pd
import sys


def main():
    df = pd.read_csv(sys.stdin)
    print(df.dtypes)


if __name__ == "__main__":
    main()

I'm using PowerShell to do some automation, and need to pipe the output of one script to the other; python test1.py | python test2.py works perfectly fine.

My question is, I have heard that you can pipe non-text data in PowerShell, which you can't do in Bash (I think), so is it possible to pipe the Dataframe as it is? (without having to convert to a CSV or some other string encoding)


Solution

  • Update:

    • PowerShell (Core) v7.4+ now does support raw byte handling with external programs - see this answer.

    • The following therefore applies only to Windows PowerShell and PowerShell (Core) v7.3-


    In PowerShell versions up to v7.3.x, there is no support for binary data (raw bytes) in PowerShell's pipeline.

    The workaround is to use cmd.exe /c (on Windows; on Unix-like platforms, use /bin/sh -c):

    cmd /c 'script1.py | script2.py'
    

    Note:

    • If you additionally want to capture the raw byte output in PowerShell:

      • Include an output redirection (>)in the cmd /c command string; e.g.:

        cmd /c 'script1.py | script2.py > out.bin'
        
      • Then read that file as bytes with Get-Content -Encoding Byte (Windows PowerShell) / Get-Content -AsByteStream (PowerShell (Core) 7+)

    • If, by contrast, you want to capture the output from the cmd /c call as text (strings):

      • You have to (temporarily) set [Console]::OutputEncoding to the system's active ANSI code page, which Python defaults to when outputting to something other than the console (deviating from the usual behavior of using the active OEM code page).

        • In Windows PowerShell (versions up to 5.1), you can to this as follows:

          [Console]::OutputEncoding = [System.Text.Encoding]::Default
          
          • Note: In PowerShell (Core) 7+, more work is needed:

            [Console]::OutputEncoding = [System.Text.Encoding]::GetEncoding([int] (Get-ItemPropertyValue HKLM:\SYSTEM\CurrentControlSet\Control\Nls\CodePage ACP))
            
      • Note that you can also configure Python to output UTF-8 by default: see this answer; in that case, use the following:

         [Console]::OutputEncoding = [System.Text.UTF8Encoding]::new()
        
      • See this answer for more information.