Search code examples
pythonsubprocesspython-cursesnpyscreen

Why is junk data appearing in my Python's subprocess stdout?


I'm writing a Python app that runs a command on an AWS remote docker container, and saves the output to a file. The command that is being run remotely is generating binary data (a database dump).

The app works great if I start the download and don't touch anything. The issue I'm having is that if I start the download, and hit Enter while it's downloading, or scroll my mouse wheel in the terminal window, my output file gets a ^M, or weird characters.

Sample Code:

#!/usr/bin/env python3

import npyscreen
import curses
import subprocess

MY_REGION=...
MY_CLUSTER=...
MY_TASK=...
MY_CONTAINER=...

class ProgressForm(npyscreen.Popup):
    def create(self):
        self.progress = self.add(
            npyscreen.TitleSliderPercent, step=1, out_of=100, name="Progress"
        )

    def activate(self):
        cmd = subprocess.Popen(
            [
                "aws",
                "--region",
                MY_REGION,
                "ecs",
                "execute-command",
                "--cluster",
                MY_CLUSTER,
                "--task",
                MY_TASK,
                "--container",
                MY_CONTAINER,
                "--command",
                "python -c 'for i in range(500_000): print(i)'",
                "--interactive",
            ],
            stdout=subprocess.PIPE,
            stderr=subprocess.STDOUT,
            bufsize=0,
        )

        total_size = 3889129
        downloaded = 0
        with open("out.log", "wb") as f:
            while True:
                chunk = cmd.stdout.read(1024)

                if not chunk:
                    break

                f.write(chunk)

                downloaded += len(chunk)

                self.progress.set_value(min(downloaded/total_size*100, 100))
                self.progress.display()

        self.parentApp.switchForm(None)

class MAIN(npyscreen.FormBaseNew):
    def create(self):
        self.items = self.add(
            npyscreen.GridColTitles,
            col_titles=["Column"],
            select_whole_line=True,
        )
        self.items.add_handlers({curses.ascii.NL: self.item_chosen})

    def activate(self):
        for i in range(4):
            self.items.values = [
                ["Row Data"]
            ]

        self.edit()

    def item_chosen(self, inpt):
        self.parentApp.switchForm("progressForm")

class App(npyscreen.NPSAppManaged):
    def onStart(self):
        self.addForm("MAIN", MAIN, name="My App")
        self.addForm("progressForm", ProgressForm)

if __name__ == "__main__":
    app = App().run()

Hitting Enter during the download, or scrolling the mouse wheel results in this:

...
10667

10668
10669
...

and this:

...
17451
17452
17453
^[[<65;121;31M17454
17455
17456
17457
...

Why is my subprocess' stdout being littered with junk data?

Edit: The full output can be found here


Solution

  • When you don't specify what subprocess should do with stdin, it gets inherited from the parent process, letting the child see your enter keys, scroll-wheel data, etc.

    A typical noninteractive process won't do "local echo" of input back to output; but you're using --interactive here, so the behavior is not surprising.

    Set stdin=subprocess.DEVNULL to explicitly route stdin from nowhere (stdin connected to /dev/null shows up as an immediate EOF on the first attempted read; most programs that aren't written to require input will handle this correctly).

    If the program requires there to be a stdin stream that isn't immediately closed, you might instead use stdin=subprocess.PIPE, and then leave cmd.stdin alone until it's time for the remote program to exit (at which point a cmd.stdin.close(), while not strictly mandatory, would not be remiss).