Below is my text file content:
name , tag/tags , location, id
xyz, abc;nhj;xygf;xyz;ajsd, jhdwegyugagdwg, T1
xasdiaos, abcd, jhdwegyugagdwg0 , T3
xyzasihd, jsdh;sdgwyi, jhdwegyugagdasodpg, T2
xyzasihd, jsdh;jadh;ahsg;sdgwyi, jhdwegyugagdasodpg, T4
I want to output the id's and total number of tags. Desired output is as follows.
T1 , 5
T3 , 1
T2 , 2
T4 , 4
I have written below piece of code for mapreduce
.
library(rmr2)
query1= function(input, output = "/user/mtech/15CS60R13/OutputP2"){
q1.map=
function(., lines){
print(lines)
keyval(unlist(strsplit(lines,split=","))[4],
length(unlist(strsplit(unlist(strsplit(lines,split=","))[2],split=";"))))
}
mapreduce(
input = input ,
output = output,
input.format = "text",
map = q1.map,
)
}
query1("/user/xyz/file.txt")
results <- from.dfs ("/user/mtech/15CS60R13/Output")
I am getting results as follows.
print(results)
$key
[1] "T4" "T1"
$val
[1] 4 5
Although when I tried below change in map
function,
keyval(lines,1)
I am getting all the 4 lines. Please explain why I am getting only 2 lines when I am putting strsplit
.
This was the mistake in map:
q1.map=
function(., lines){
for(line in lines){
keyval(unlist(strsplit(line,split=","))[4],
length(unlist(strsplit(unlist(strsplit(line,split=","))[2],split=";"))))
}
}
Thank You !