Search code examples
pythonjsonsubprocesspcaptshark

Python File Writing Requires Manual Program Interruption for Completion


I'm having trouble with my program that writes parts of a pcap file to a new json file. The issue is that the last 100 lines of the file are only visible when I manually stop the program in the terminal. It only writes the last lines if I forcefully kill it.

file_name = "my_file.pcap"
output_file = 'output_json.json'

start_time = datetime.datetime(2023, 3, 24, 11, 57, 39)
end_time = datetime.datetime(2023, 3, 24, 11, 57, 40)

command = [
        "tshark", "-r", file_name, "-Y",
        f'(oran_fh_cus.extType == 11) && (frame.time >= "{start_time.strftime("%Y-%m-%d %H:%M:%S")}" && frame.time <= "{end_time.strftime("%Y-%m-%d %H:%M:%S")}")',
        "-T", "json", "-e", "frame.time", "-e", "frame.number", "-e", "oran_fh_cus.extType",
        "-e", "oran_fh_cus.bfwI", "-e", "oran_fh_cus.bfwQ"
    ]

with open(output_file, "w") as f:
    subprocess.run(command, stdout=f)

I encounter the following errors before stopping the program:

"Unexpected end of string" "Expected comma or closing bracket"

However, when I manually stop the program, the last lines are added successfully, and the line that triggered the error appears to be completely normal.


Solution

  • This is happening for two reasons:

    1. Your output is buffered. You can disable that with the -l flag (see man page docs below). That will make it print each packet as they are received.
    2. As @jasonharper explained in comments, since you are emitted JSON as the output, your output can't be truly valid/finished until the array of JSON packets is closed, with the last ]. If that doesn't work for you, I think you will have to switch to some other output format.

    -l
    
        Flush the standard output after the information for each packet is printed. (This is
        not, strictly speaking, line-buffered if -V was specified; however, it is the same as
        line-buffered if -V wasn't specified, as only one line is printed for each packet, and,
        as -l is normally used when piping a live capture to a program or script, so that
        output for a packet shows up as soon as the packet is seen and dissected, it should
        work just as well as true line-buffering. We do this as a workaround for a deficiency
        in the Microsoft Visual C++ C library.)
    
        This may be useful when piping the output of TShark to another program, as it means
        that the program to which the output is piped will see the dissected data for a packet
        as soon as TShark sees the packet and generates that output, rather than seeing it
        only when the standard output buffer containing that data fills up.