take tokens from the same line in r programming

using R programming ,i need to take tokens ngram=2 from a file.

the problem is that it combines the lines , and some tokens has part at the end of line and the other part at the start of the next line

Req_tok <-jobs %>% unnest_tokens(ngram,POSITION, token = "ngrams", n = 2)

in the file jobs i have the first two lines:

it architect

it helpdesk support agents

i get tokens like:

it architect
architect it
it helpdesk
and so on ....

what to do in order not to get tokens like "architect it"

i want to tokenize every line separately

Solution

Just add collapse = FALSE in your unnest_tokens:

library(tidytext)
library(dplyr)

jobs %>% 
  unnest_tokens(ngram, POSITION, token = "ngrams", n = 2, collapse = FALSE)

Result:

               ngram
1       it architect
2        it helpdesk
2.1 helpdesk support
2.2   support agents

Remember to convert your string vector to character if it is a factor variable, otherwise unnest_token will throw you an error.

Data:

jobs = data.frame(POSITION = c("it architect", "it helpdesk support agents"), stringsAsFactors = FALSE)