Search code examples
pythonsystem-callsstanford-nlp

How to make system call in python and store the output in a given output directory?


I was working with Stanford CoreNLP, right now I'm running the coreNLP toolkit by using the following command from command-line:

java -cp stanford-corenlp-2012-07-09.jar:stanford-corenlp-2012-07-06-models.jar:xom.jar:
joda-time.jar -Xmx3g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,
pos,lemma,ner -filelist file_list.txt -outputDirectory <OUTPUT DIRECTORY PATH>

This generates xml files with the required annotation. Now I need to using this command inside a function in python such that it stores the output in the output_dir. The function is like:

def preprocess(file_list.txt, ouptut_dir)

I read about system calls, and using subprocess, but I didn't quite understand how to use it such that it writes the output to the given output_dir.

Please help!!!


Solution

  • That really does not have much to do with subprocess, but rather on how Stanford CoreNLP is used from the CLI. Assuming that the -outputDirectory flag tells it where to store it's output, it's a simple matter of passing the correct CLI argument. Here is one proposition:

    import subprocess
    
    def preprocess(fname, output_dir):
        subprocess.check_call([
            'java',
            '-cp',
            'stanford-corenlp-2012-07-09.jar:stanford-corenlp-2012-07-06-models.jar:xom.jar:joda-time.jar',
            '-Xmx3g', 'edu.stanford.nlp.pipeline.StanfordCoreNLP'
            '-annotators', 'tokenize,ssplit,pos,lemma,ner',
            '-filelist', fname,
            '-outputDirectory', output_dir
        ])