Search code examples
pythonbashescapingpopenquote

how to use popen with command line arguments contains single quote and double quote?


I want to run following jq command with subprocess.Popen() in python3.

$ jq  'INDEX(.images[]; .id) as $imgs | {
    "filename_with_label":[
         .annotations[]
        | select(.attributes.type=="letter" )
        | $imgs[.image_id] + {label:.text}
        | {id:.id} + {filename:.file_name} + {label:.label}
     ]
   }' image_data_annotation.json > image_data_annotation_with_label.json

Note that first command line argument contains dot, dollar sign, double quotes within single quote. FYI, jq is JSON processor utility for processing json files.

I wrote following python3 script for automating JSON file processing with jq utility.

#!python3
# file name: letter_image_tool.py

import os, subprocess

"""
command line example to automate
$ jq  'INDEX(.images[]; .id) as $imgs | {
    "filename_with_label":[
         .annotations[]
        | select(.attributes.type=="letter" )
        | $imgs[.image_id] + {label:.text}
        | {id:.id} + {filename:.file_name} + {label:.label}
     ]
   }' image_data_annotation.json > image_data_annotation_with_label.json
"""

# define first command line argument
jq_filter='\'INDEX(.images[]; .id) as $imgs | { "filename_with_label" : [ .annotations[] | select(.attributes.type=="letter" ) | $imgs[.image_id] + {label:.text} | {id:.id} + {filename:.file_name} + {label:.label} ] }\''

input_json_files= [ "image_data_annotation.json"]
output_json_files= []

for input_json in input_json_files:
    print("Processing %s" %(input_json))
    filename, ext = os.path.splitext(input_json)
    output_json = filename + "_with_label" + ext
    output_json_files.append(output_json)
    print("output file is : %s" %(output_json))

    #jq_command ='jq' + " " +  jq_filter, input_json + ' > ' +  output_json
    jq_command =['jq', jq_filter,  input_json + ' > ' +  output_json]
    print(jq_command)
    subprocess.Popen(jq_command, shell=True)

Running the above python script on bash results in folowing:

$ ./letter_image_tool.py
Processing image_data_annotation.json
output file is : image_data_annotation_with_label.json
['jq', '\'INDEX(.images[]; .id) as $imgs | { "filename_with_label" : [ .annotations[] | select(.attributes.type=="letter" ) | $imgs[.image_id] + {label:.text} | {id:.id} + {filename:.file_name} + {label:.label} ] }\'', 'image_data_annotation.json > image_data_annotation_with_label.json']
jq - commandline JSON processor [version 1.6-124-gccc79e5-dirty]

Usage:  jq [options] <jq filter> [file...]
        jq [options] --args <jq filter> [strings...]
        jq [options] --jsonargs <jq filter> [JSON_TEXTS...]

jq is a tool for processing JSON inputs, applying the given filter to
its JSON text inputs and producing the filter's results as JSON on
standard output.

The simplest filter is ., which copies jq's input to its output
unmodified (except for formatting, but note that IEEE754 is used
for number representation internally, with all that that implies).

For more advanced filters see the jq(1) manpage ("man jq")
and/or https://stedolan.github.io/jq

Example:

        $ echo '{"foo": 0}' | jq .
        {
                "foo": 0
        }

For a listing of options, use jq --help.

It does not handle the first argument of jq utility:

'INDEX(.images[]; .id) as $imgs | {
    "filename_with_label":[
         .annotations[]
        | select(.attributes.type=="letter" )
        | $imgs[.image_id] + {label:.text}
        | {id:.id} + {filename:.file_name} + {label:.label}
     ]
   }'

The first argument should be enclosed with single quote as above snipet but my script does not handle it.

I think the main problems are related to the dot, dollar sign, single quote and double quote used in the first command line argument (jq_filter in the above python script). But I don't know how to treat this kind of complex meta character related to bash.

What should I do to solve above problems?

Thanks for your kind reading.

Update with my solution

With triple quote for jq_filter defintion, and space seprated join as follows

#!python3
# file name: letter_image_tool.py

import os, subprocess

"""
command line example to automate
$ jq  'INDEX(.images[]; .id) as $imgs | {
    "filename_with_label":[
         .annotations[]
        | select(.attributes.type=="letter" )
        | $imgs[.image_id] + {label:.text}
        | {id:.id} + {filename:.file_name} + {label:.label}
     ]
   }' image_data_annotation.json > image_data_annotation_with_label.json
"""

# define first command line argument with triple quotes
jq_filter=""" 'INDEX(.images[]; .id) as $imgs | { 
       "filename_with_label" : [ 
        .annotations[] 
       | select(.attributes.type=="letter" ) 
       | $imgs[.image_id] + {label:.text} 
       | {id:.id} + {filename:.file_name} + {label:.label} ] } ' """

input_json_files= [ "image_data_annotation.json"]
output_json_files= []

for input_json in input_json_files:
    print("Processing %s" %(input_json))
    filename, ext = os.path.splitext(input_json)
    output_json = filename + "_with_label" + ext
    output_json_files.append(output_json)
    print("output file is : %s" %(output_json))

    #jq_command ='jq' + " " +  jq_filter, input_json + ' > ' +  output_json
    # jq command composed with space separated join
    jq_command =' '.join['jq', jq_filter,  input_json, ' > ',  output_json]
    print(jq_command)

    # shell keyword argument should be set True
    subprocess.Popen(jq_command, shell=True)

With triple double quotes, jq_filter can be more readable using multi-lined definition instead of single line defintion.


Solution

  • The reason you need single quotes is to prevent the shell from doing any expansion of your argument. This is a problem, only when using shell=True. If this is not set, the shell will never touch your arguments and there is no need to "protect" them.

    However, the shell is also responsible for the stdout redirect (i.e. [... '>', output_json]). Not using the shell, requires that the redirect is handled in the Python code instead. That, however, is as simple as adding the argument stdout=... to Popen.

    All-in-all this means that your code can be rewritten as

    import os
    import subprocess
    
    # Still define first command line argument with triple quotes for readability
    # Note that there are no single quotes though
    jq_filter = """INDEX(.images[]; .id) as $imgs | {
           "filename_with_label" : [
            .annotations[]
           | select(.attributes.type=="letter" )
           | $imgs[.image_id] + {label:.text}
           | {id:.id} + {filename:.file_name} + {label:.label} ] }"""
    
    input_json_files = ["image_data_annotation.json"]
    output_json_files = []
    
    for input_json in input_json_files:
        print("Processing %s" % (input_json))
        filename, ext = os.path.splitext(input_json)
        output_json = filename + "_with_label" + ext
        output_json_files.append(output_json)
        print("output file is : %s" % (output_json))
    
        # Keep command as list, since this is what we need when NOT using shell=True
        # Note also that the redirect and the output file are not parts of the argument list
        jq_command = ['jq', jq_filter,  input_json]
    
        # shell keyword argument should NOT be set True
        # Instead redirect stdout to an out_file
        # (We must open the file for writing before redirecting)
        with open(output_json, "w") as out_file:
            subprocess.Popen(jq_command, stdout=out_file)
    

    Generally it is recommended to not use shell=True anyway, since that opens up another vector of attack against the code, since an injection attack can give full access to the shell. Also, another small benefit with not using the shell, is that it will reduce the number of created subprocesses, since no extra shell process is needed.