Search code examples
pythonstdinargparse

Python Read from Stdin with Arguments


I want to read from python stdin but also to have input options in my program. When I try to pass an option to my programm I get the error file not found and my arguments are discarded.

For parsing the arguments I use the following code:

parser=argparse.ArgumentParser(description='Training and Testing Framework')

parser.add_argument('--text', dest='text',
                   help='The text model',required=True)
parser.add_argument('--features', dest='features',
                   help='The features model',required=True)
parser.add_argument('--test', dest='testingset',
                   help='The testing set.',required=True)
parser.add_argument('--vectorizer', dest='vectorizer',
                   help='The vectorizer.',required=True)
args = vars(parser.parse_args())

For reading from the stdin I use the following code:

for line in sys.stdin.readlines():
    print(preprocess(line,1))

Command Line

echo "dsfdsF" |python ensemble.py -h
/usr/local/lib/python2.7/dist-packages/pandas/io/excel.py:626: UserWarning: Installed openpyxl is not supported at this time. Use >=1.6.1 and <2.0.0.
  .format(openpyxl_compat.start_ver, openpyxl_compat.stop_ver))
Traceback (most recent call last):
  File "ensemble.py", line 38, in <module>
    from preprocess import preprocess
  File "/home/nikos/experiments/mentions/datasets/preprocess.py", line 7, in <module>
    with open(sys.argv[1], 'rb') as csvfile:
IOError: [Errno 2] No such file or directory: '-h'

Solution

  • Your preprocess.py file is trying to read form sys.argv[1] and open it as a file.

    If you pass -h to your command line, it is trying to open file with that name.

    split command line parsing from processing

    Your preprocess function shall not care about command line parameters, it shall get the open file descriptor as an argument.

    So after you parse command line parameters, you shall take care about providing file descriptor, in your case it will be sys.stdin.

    Sample solution using docopt

    There is nothing wrong with argparse, my favourite parser is docopt and I will use it to illustrate typical split of command line parsing, preparing final function call and final function call. You can achieve the same with argparse too.

    First install docopt:

    $ pip install docopt
    

    Here comes the fromstdin.py code:

    """fromstdin - Training and Testing Framework
    Usage: fromstdin.py [options] <input>
    
    Options:
        --text=<textmodel>         Text model [default: text.txt]
        --features=<features>      Features model [default: features.txt]
        --test=<testset>           Testing set [default: testset.txt]
        --vectorizer=<vectorizer>  The vectorizec [default: vector.txt]
    
    Read data from <input> file. Use "-" for reading from stdin.
    """
    import sys
    
    def main(fname, text, features, test, vectorizer):
        if fname == "-":
            f = sys.stdin
        else:
            f = open(fname)
        process(f, text, features, test, vectorizer)
        print "main func done"
    
    def process(f, text, features, test, vectorizer):
        print "processing"
        print "input parameters", text, features, test, vectorizer
        print "reading input stream"
        for line in f:
            print line.strip("\n")
        print "processing done"
    
    
    if __name__ == "__main__":
        from docopt import docopt
        args = docopt(__doc__)
        print args
        infile = args["<input>"]
        textfile = args["--text"]
        featuresfile = args["--features"]
        testfile = args["--test"]
        vectorizer = args["--vectorizer"]
        main(infile, textfile, featuresfile, testfile, vectorizer)
    

    Can be called like:

    $ python fromstdin.py
    Usage: fromstdin.py [options] <input>
    

    Show the help:

    $ python fromstdin.py -h
    fromstdin - Training and Testing Framework
    Usage: fromstdin.py [options] <input>
    
    Options:
        --text=<textmodel>         Text model [default: text.txt]
        --features=<features>      Features model [default: features.txt]
        --test=<testset>           Testing set [default: testset.txt]
        --vectorizer=<vectorizer>  The vectorizec [default: vector.txt]
    
    Read data from <input> file. Use "-" for reading from stdin.
    

    Use it, feeding from stdin:

    (so)javl@zen:~/sandbox/so/cmd$ ls | python fromstdin.py -
    {'--features': 'features.txt',
     '--test': 'testset.txt',
     '--text': 'text.txt',
     '--vectorizer': 'vector.txt',
     '<input>': '-'}
    processing
    input parameters text.txt features.txt testset.txt vector.txt
    reading input stream
    bcmd.py
    callit.py
    fromstdin.py
    scrmodule.py
    processing done
    main func done