Hi there I have transform my arules into data frame for further analysis but the problem is my data frame looks like this:
df <- data.frame(rules=c("{45107} => {62557}","{17759} => {60521 }",
"{53721} => {53720}","{63830} => {17753}","{45413} => {45412}",
"{3885,59800,17759} => {4749}","{17721,55906} => {9314}"))
rules
{45107} => {62557}
{17759} => {60521 }
{53721} => {53720}
{63830} => {17753}
{45413} => {45412}
{3885,59800,17759} => {4749}
{17721,55906} => {9314}
Can you help me change my data frame into this format?
lhs1 lhs2 lhs3 rhs
45107 62557
17759 60521
53721 53720
63830 17753
45413 45412
3885 59800 17759 4749
17721 55906 9314
With your data.frame df and putting all numbers after =>
in rhs
:
# define the number of maximum "lhs", there is 2 options :
# option 1, if there are few rules and number of maximum "lhs" is obvious :
maxlhs<-3
# option 2, if there are many many rules and you don't want to count all "lhs" :
maxlhs<-max(sapply(df$rules,FUN=function(x)length(gregexpr(',',x)[[1]]))) + 1
# create your new data.frame by "reformatting" the rules
newdf<-t(apply(df,1,function(rule,maxlhs){
split1<-strsplit(gsub("[ }{]","",rule),"=>")[[1]]
split2<-strsplit(split1[1],",")[[1]]
split2<-c(split2,rep(NA,maxlhs-length(split2)))
return(as.numeric(c(split2,split1[2])))
},maxlhs=maxlhs))
# name the new data.frame's columns
colnames(newdf)<-c(paste0("lhs",1:maxlhs),"rhs")
> newdf
lhs1 lhs2 lhs3 rhs
[1,] 45107 NA NA 62557
[2,] 17759 NA NA 60521
[3,] 53721 NA NA 53720
[4,] 63830 NA NA 17753
[5,] 45413 NA NA 45412
[6,] 3885 59800 17759 4749
[7,] 17721 55906 NA 9314
Is that ok or do you want the new data.frame to be exactly like the one displayed in your question ?