Search code examples
pythonc++windowssubprocesspipe

Passing bytes from Python to a C++ Subprocess via stdin fails on Windows


I have a python program that runs iterations of a loop. During the loop there is a part of the code that can run in parallel, to do this I decided to use a C++ program that I have pre-compiled and I call as a subprocess. This program needs an input that is about 1K bytes long and will return a few bytes as an answer.

In the C++ program I have the following method to read the input bytes from stdin:

#include <iostream>
#define VAR_NUM 1000         // number of bytes to read
using namespace std;

void read_vars(int* vars){
    char buf;
    int chk;

    for(int i=0; i<VAR_NUM; i++){
        chk = fread(&buf, sizeof(char), 1, stdin);
        std::cout << (int)(unsigned char)buf << "(" << chk << ") ";
        vars[i] = (int)(unsigned char)buf;
        if(chk==0){
            if(feof(stdin)) std::cout << "[EOF] ";
            if(ferror(stdin)) std::cout << "[ERROR] ";
        }
    }
    std::cout << endl;
    return;
}

int main(){
    int* vars = (int*) malloc(VAR_NUM*sizeof(int));
    for(int i=0; i<VAR_NUM; i++) vars[i] = 0;
    read_vars(vars);
    return 0;
}

It reads bytes from stdin and places them in the pre-allocated vars array as integers. For now stdout is used for debugging, so every byte read is printed along with it's fread return value and the status of stdin in case of an error.

On the Python side I run the following code to invoke the compiled C++ program:

import os
import random
from subprocess import run, Popen, PIPE, DEVNULL, STDOUT

def run_cpp(vars):
    command = os.path.join('.', 'program')
    inp = bytes(vars)

    proc = Popen(command, stdin=PIPE, stderr=PIPE, text=False) # will have stdout=PIPE in final code
    print('python wrote:', proc.stdin.write(inp),'bytes\n')
    print('vars read by c++:')
    res, err = proc.communicate()

    print('\n'+'stderr:', err, '\n')
    print('stdout:', res, '\n')
    return 

# random bytes
vars = [random.randint(0,255) for i in range(1000)]           # must have the same number of bytes as VAR_NUM in C++
print('vars written by python:')
print(vars,'\n')

print(run_cpp(vars))

It creates a process calling the C++ program and writes the bytes in it's stdin via pipe. I do not immediately communicate() with the subprocess since: I want to create many of them in a loop and pass them their respective inputs. Then wait for them as they all run in parallel. (proc.communicate() is in this code for testing only)

When I run the code above on Ubuntu Linux it runs as expected: all the bytes I write from Python are printed out on stdout correctly. On Windows I get an output like the following:

python wrote: 1000 bytes
17(1) 27(1) 23(1) 0(1) 27(1) 23(1) 15(1) 19(1) 16(1) 27(1) 11(1) 23(1) 28(1) 13(1) 28(1) 6(1) 27(1) 28(1) 6(1) 23(1) 4(1) 13(1) 3(1) 11(1) 22(1) 14(1) 11(1) 11(1) 4(1) 4(1) 13(1) 16(1) 19(1) 21(1) 21(0) [EOF] 21(0) [EOF] 21(0) [EOF] 21(0) [EOF] 21(0) [EOF] 21(0) [EOF] 21(0) [EOF] 21(0) [EOF] 21(0) [EOF] ...

proc.stdin.write(inp) reports the correct amount of bytes written. The first few bytes are read by read_vars() correctly, but after a while fread seems to be reaching EOF prematurely (and the last saved byte in buf is repeatedly saved in the vars array). I tried a few different configuration options for Popen (eg shell=False), I tried flushing proc.stdin after writing it, but this behavior does not change no matter what I've tried.

This is the case if I try to write a large amount of bytes, for 10 to 200 bytes it always works correctly, for 300 to 500 it fails sometimes and for close to 1000, which is my target, it will always fail. Also the amount of bytes passed correctly always changes, some times it will read hundreds of bytes correctly without fail, sometimes it fails after the first few. To try this out, change the #define VAR_NUM 1000 and vars = [random.randint(0,255) for i in range(1000)] statements in C++ and python respectivelly

I am guessing I'm missing some required python/pipe/g++ configuration, to make this work on Windows.

Edit: made the example reproducible.


Solution

  • The error is caused by stdin not being in "Binary translation mode" on Windows, as pointed out by Mark and Kenny in the comments.

    Using freopen(NULL, "rb", stdin); (from here) did not work for me. But using _setmode(_fileno(stdin), _O_BINARY) before reading, worked perfectly (along with includes for <fcntl.h> and <io.h>, from here).

    Final working code for C++:

    #include <iostream>
    #include <io.h>
    #include <fcntl.h>
    #include <stdexcept>
    #define VAR_NUM 1000         // number of bytes to read
    using namespace std;
    
    void read_vars(int* vars){
        char buf;
        int chk;
    
        for(int i=0; i<VAR_NUM; i++){
            chk = fread(&buf, sizeof(char), 1, stdin);
            //std::cout << (int)(unsigned char)buf << "(" << chk << ") ";
            vars[i] = (int)(unsigned char)buf;
            if(chk==0){
                if(feof(stdin)){
                    fwrite("[EOF error]", sizeof(char), 12, stderr);
                    return;
                }
                if(ferror(stdin)){
                    fwrite("[stdin ERROR]", sizeof(char), 14, stderr);
                    return;
                }
                
            }
        }
        return;
    }
    
    int main(){
        if (_setmode(_fileno(stdin), _O_BINARY) == -1)
            fwrite("[stdin conversion ERROR]", sizeof(char), 25, stderr);
            return 1;
    
        int* vars = (int*) malloc(VAR_NUM*sizeof(int));
        for(int i=0; i<VAR_NUM; i++) vars[i] = 0;
        read_vars(vars);
        return 0;
    }