Search code examples
pythonpowershellcharacter-encodingpipecp1252

Python pipe cp1252 string from PowerShell to a python (2.7) script


After a few days of dwelling over stackoverflow and python 2.7 doc, I have come to no conclusion about this.

Basically I'm running a python script on a windows server that must have as input a block of text. This block of text (unfortunately) has to be passed by a pipe. Something like:

PS > [something_that_outputs_text] | python .\my_script.py

So the problem is:

The server uses cp1252 encoding and I really cannot change it due to administrative regulations and whatnot. And when I pipe the text to my python script, when I read it, it comes already with ? whereas characters like \xe1 should be.

What I have done so far:

Tested with UTF-8. Yep, chcp 65001 and $OutputEncoding = [Console]::OutputEncoding "solve it", as in python gets the text perfectly and then I can decode it to unicode etc. But apparently they don't let me do it on the server /sadface.

A little script to test what the hell is happening:

import codecs
import sys

def main(argv=None):
    if argv is None:
        argv = sys.argv
        if len(argv)>1:
            for arg in argv[1:]:
                print arg.decode('cp1252')

    sys.stdin = codecs.getreader('cp1252')(sys.stdin)
    text = sys.stdin.read().strip()
    print text
    return 0

if __name__=="__main__":
    sys.exit(main())

Tried it with both the codecs wrapping and without it.

My input & output:

PS > echo "Blá" | python .\testinput.py blé
blé
Bl?

--> So there's no problem with the argument (blé) but the piped text (Blá) is no good :(

I even converted the text string to hex and, yes, it gets flooded with 3f (AKA mr ?), so it's not a problem with the print.

[Also: it's my first question here... feel free to ask any more info about what I did]

EDIT

I don't know if this is relevant or not, but when I do sys.stdin.encoding it yields None

Update: So... I have no problems with cmd. Checked sys.stdin.encoding while running the program on cmd and everything went fine. I think my head just exploded.


Solution

  • How about saving the data into a file and piping it to Python on a CMD session? Invoke Powershell and Python on CMD. Like so,

    c:\>powershell -command "c:\genrateDataForPython.ps1 -output c:\data.txt"
    c:\>type c:\data.txt | python .\myscript.py
    

    Edit

    Another an idea: convert the data into base64 format in Powershell and decode it in Python. Base64 is simple in Powershell, I guess in Python it isn't hard either. Like so,

    # Convert some accent chars to base64
    $s  = [Text.Encoding]::UTF8.GetBytes("éêèë")
    [System.Convert]::ToBase64String($s)
    # Output:
    w6nDqsOow6s=
    
    # Decode:
    $d  = [System.Convert]::FromBase64String("w6nDqsOow6s=")
    [Text.Encoding]::UTF8.GetString($d)
    # Output
    éêèë