Search code examples
pythonpos-tagger

Python: open treetagger in script


How can I use the treetagger in a python-script?

I have a sentence given, and the treetagger should analyze it. In a normal command line, I can do the following:

echo 'This is a test!' | cmd/tree-tagger-english-utf8  

but how can I do this in a python script?

The output of the command above is the following:

echo 'This is a test!' | cmd/tree-tagger-english
    reading parameters ...
    tagging ...
     finished.
This    DT  this
is  VBZ be
a   DT  a
test    NN  test
!   SENT    !

In my script, I need the tags, i.e. "DT", "VBZ", "DT", "NN", "SENT" which I'd like to save in a list. I need these tags later to insert them in a string.

Thanks for any help! :)


Solution

  • Look at the subprocess module: a simple example follows...

    $ cat test.py 
    #!/usr/bin/python
    import os
    import sys
    import subprocess
    
    list_of_lists = []
    
    process = subprocess.Popen(["cmd/tree-tagger-english-utf8"], stdout=subprocess.PIPE)
    (output, err) = process.communicate(sys.stdin)
    count = 0
    for line in output.split('\n'):
        # condition to skip the first 3 lines
        if count<3:
            count=count+1
        else:
            new_list = [elem for elem in line.split()]
            list_of_lists.append(new_list)
    exit_code = process.wait()
    print list_of_lists
    $