Search code examples
rmapreducedatasetdataformat

change mapreduce output format using R


I faced a problem.When I got the mapreduce output ,the result is {key :value} format.

For example the mapreduce output is:

key value

a   [111,112,114]

b   [111,122,134]

c   [125]

so I want to change format like this:

a  111

a  112

a  114

b  111

b  122

b  134

c  125

so I want to change format using R.How to do that?


Solution

  • You could try:

     library(stringr)
     l1 <- lapply(str_extract_all(dat$value, "[0-9]+"), as.numeric)
     #library(stringi)
     #l1 <- lapply(stri_extract_all_regex(dat$value, "[0-9]+"), as.numeric)  #would be faster  
     data.frame(key=rep(dat$key,sapply(l1, length)), value=unlist(l1))
     #     key value
     #1   a   111
     #2   a   112
     #3   a   114
     #4   b   111
     #5   b   122
     #6   b   134
     #7   c   125
    

    Or

    library(data.table)
    library(devtools)
    source_gist(11380733)
    cSplit(dat, "value", "[^0-9]", fixed=FALSE, direction="long")[value!="" ]
    #   key value
    #1:   a   111
    #2:   a   112
    #3:   a   114
    #4:   b   111
    #5:   b   122
    #6:   b   134
    #7:   c   125
    

    data

      dat <- structure(list(key = c("a", "b", "c"), value = c("[111,112,114]", 
     "[111,122,134]", "[125]")), .Names = c("key", "value"), class = "data.frame", row.names = c(NA, 
     -3L))