term list / term vector pos-tagging in R

I have a .csv file with only one column containing 1000 rows. Each row contains a word (bag-of-words model). Now I want to find out for each word whether it is a noun, verb, adjective etc. .I would like to have a second column (with 1000 rows), each containing the information (noun or verb) belongig to the word in column 1.

I already have imported the csv into R. But what do I have to do now?

[Here is an example. I have these words and I want to find out whether it is a noun verb etc] [ enter image description here

Solution

You could use spacyr which is an R Wrapper to the Python package spaCy.

Note: you will have to

setup spacy https://spacy.io/usage/
install the english language models https://spacy.io/usage/models

library(spacyr)

spacy_initialize(python_executable = '/path/to/python')

Then for your terms:

Terms <- data.frame(Term = c("unit",
                    "determine",
                    "generate",
                    "digital",
                    "mount",
                    "control",
                    "position",
                    "input",
                    "output",
                    "user"), stringsAsFactors = FALSE)

Use the function spacy_parse() to tag your terms and add them to your dataframe:

Terms$POS_TAG <- spacy_parse(Terms$Term)$pos

The result is:

        Term POS_TAG
1       unit    NOUN
2  determine    VERB
3   generate    VERB
4    digital     ADJ
5      mount    VERB
6    control    NOUN
7   position    NOUN
8      input    NOUN
9     output    NOUN
10      user    NOUN