Search code examples
rtext-miningtm

a list of multiple lists of 2 for synonyms


I want to read the synonyms from a csv file , where the first word is the "main" word and the rest of the words in the same record are its synonyms enter image description here

now i basically want to create a list like i would have in R ,

**synonyms <- list(
  list(word="ss", syns=c("yy","yyss")),
  list(word="ser", syns=c("sert","sertyy","serty"))
)**

This gives me a list as

synonyms
[[1]]
[[1]]$word
[1] "ss"

[[1]]$syns
[1] "yy"   "yyss"


[[2]]
[[2]]$word
[1] "ser"

[[2]]$syns
[1] "sert"   "sertyy" "serty"

which is essentially a list of lists of "word" and "syns". how do i go about creating the similar list while reading the word and synonyms from a csv file

any pointers would help !! Thanks


Solution

  • This process should return what you want.

    # read in data using readLines
    myStuff <- readLines(textConnection(temp))
    

    This will return a character vector with one element per line in the file. Note that textConnection is not necessary for reading in files. Just supply the file path. Now, split each vector element into a vectors using strsplit and return a list.

    myList <- strsplit(myStuff, split=" ")
    

    Now, separate the first element from the remaining element for each vector within the list.

    result <- lapply(myList, function(x) list(word=x[1], synonyms=x[-1]))
    

    This returns the desired result. We use lapply to move through the list items. For each list item, we return a named list where the first element, named word, corresponds to the first element of the vector that is the list item and the remaining elements of this vector are placed in a second list element called synonyms.

    result
    [[1]]
    [[1]]$word
    [1] "ss"
    
    [[1]]$synonyms
    [1] "yy"   "yyss"
    
    
    [[2]]
    [[2]]$word
    [1] "ser"
    
    [[2]]$synonyms
    [1] "sert"   "sertyy" "serty" 
    
    
    [[3]]
    [[3]]$word
    [1] "at"
    
    [[3]]$synonyms
    [1] "ate"  "ater" "ates"
    
    
    [[4]]
    [[4]]$word
    [1] "late"
    
    [[4]]$synonyms
    [1] "lated" "lates" "latee"
    

    data

    temp <- 
    "ss yy yyss
    ser sert sertyy serty
    at ate ater ates
    late lated lates latee"