Search code examples
pythonnode.jsstdin

How to read stdin buffer in advance before an EOF in python3?


In my python code I wrote the following function to receive self-defined binary package from stdin.

def recvPkg():
     ## The first 4 bytes stands for the remaining package length
    Len = int.from_bytes(sys.stdin.buffer.read(4), byteorder='big', signed=True)
     ## Then read the remaining package
    data = json.loads(str(sys.stdin.buffer.read(Len), 'utf-8'))
     ## do something...

while True:
    recvPkg()

Then, in another Node.js program I spawn this python program as a child process, and send bytes to it.

childProcess = require('child_process').spawn('./python_code.py');
childProcess.stdin.write(someBinaryPackage)

I expect the child process to read from its stdin buffer once a package is received and give the output. But it doesn't work, and I think the reason is that the child process won't begin to read unless its stdin buffer receive a signal, like an EOF. As a proof, if I close childProcess's stdin after stdin.write, the python code will work and receive all the buffered packages at once. This is not the way I want because I need childProcess's stdin to be open. So is there any other way for node.js to send a signal to childProcess to inform of reading from stdin buffer?

(sorry for poor english.


Solution

  • From Wikipedia (emphasis mine):

    Input from a terminal never really "ends" (unless the device is disconnected), but it is useful to enter more than one "file" into a terminal, so a key sequence is reserved to indicate end of input. In UNIX the translation of the keystroke to EOF is performed by the terminal driver, so a program does not need to distinguish terminals from other input files.

    There is no way to send an EOF character how you are expecting. EOF isn't really a character that exists. When you're in a terminal, you can press the key sequence ctrlz on Windows, and ctrld on UNIX-like enviroments. These produce control characters for the terminal (code 26 on Windows, code 04 on UNIX) and are read by the terminal. The terminal (upon reading this code) will then essentially stop writing to a programs stdin and close it.

    In Python, a file object will .read() forever. The EOF condition is that .read() returns ''. In some other languages, this might be -1, or some other condition.

    Consider:

    >>> my_file = open("file.txt", "r")
    >>> my_file.read()
    'This is a test file'
    >>> my_file.read()
    ''
    

    The last character here isn't EOF, there's just nothing there. Python has .read() until the end of the file and can't .read() any more.

    Because stdin in a special type of 'file' it doesn't have an end. You have to define that end. The terminal has defined that end as the control characters, but here you are not passing data to stdin via a terminal, you'll have to manage it yourself.

    Just closing the file

    Input [...] never really "ends" (unless the device is disconnected)

    Closing stdin is probably the simplest solution here. stdin is an infinite file, so once you're done writing to it, just close it.

    Expect your own control character

    Another option is to define your own control character. You can use whatever you want here. The example below uses a NULL byte.

    Python
    class FileWithEOF:
        def __init__(self, file_obj):
            self.file = file_obj
            self.value = bytes()
        def __enter__(self):
            return self
        def __exit__(self, *args, **kwargs):
            pass
        def read(self):
            while True:
                val = self.file.buffer.read(1)
                if val == b"\x00":
                    break
                self.value += val
            return self.value
    
    data = FileWithEOF(sys.stdin).read()
    
    Node
    childProcess = require('child_process').spawn('./python_code.py');
    childProcess.stdin.write("Some text I want to send.");
    childProcess.stdin.write(Buffer.from([00]));
    

    You might be reading the wrong length

    I think the value you're capturing in Len is less than the length of your file.

    Python
    import sys
    
    while True:
        length = int(sys.stdin.read(2))
        with open("test.txt", "a") as f:
            f.write(sys.stdin.read(length))
    
    Node
    childProcess = require('child_process').spawn('./test.py');
    
    // Python reads the first 2 characters (`.read(2)`)
    childProcess.stdin.write("10"); 
    
    // Python reads 9 characters, but does nothing because it's
    // expecting 10. `stdin` is still capable of producing bytes from
    // Pythons point of view.
    childProcess.stdin.write("123456789");
    
    // Writing the final byte hits 10 characters, and the contents
    // are written to `test.txt`.
    childProcess.stdin.write("A");