Search code examples
rpos-taggerterm-vectors

term list / term vector pos-tagging in R


I have a .csv file with only one column containing 1000 rows. Each row contains a word (bag-of-words model). Now I want to find out for each word whether it is a noun, verb, adjective etc. .I would like to have a second column (with 1000 rows), each containing the information (noun or verb) belongig to the word in column 1.

I already have imported the csv into R. But what do I have to do now?

[Here is an example. I have these words and I want to find out whether it is a noun verb etc] [enter image description here


Solution

  • You could use spacyr which is an R Wrapper to the Python package spaCy.

    Note: you will have to

    library(spacyr)
    
    spacy_initialize(python_executable = '/path/to/python')
    

    Then for your terms:

    Terms <- data.frame(Term = c("unit",
                        "determine",
                        "generate",
                        "digital",
                        "mount",
                        "control",
                        "position",
                        "input",
                        "output",
                        "user"), stringsAsFactors = FALSE)
    

    Use the function spacy_parse() to tag your terms and add them to your dataframe:

    Terms$POS_TAG <- spacy_parse(Terms$Term)$pos
    

    The result is:

            Term POS_TAG
    1       unit    NOUN
    2  determine    VERB
    3   generate    VERB
    4    digital     ADJ
    5      mount    VERB
    6    control    NOUN
    7   position    NOUN
    8      input    NOUN
    9     output    NOUN
    10      user    NOUN