Search code examples
rcsvimportdelimiterseparator

Multiple Separators for the same file input R


I've had a look for answers, but have only found things referring to C or C#. I realise that much of R is written in C but my knowledge of it is non-existent. I am also relatively new to R. I am using the current Rstudio.

This is similar to what I want, I think. Read the data efficiently with multiple separating lines in R

I have a csv file but one variable is a string with values separated by _ and - And I would like to know if there is a package or extra code which does the following on the read. command.

"1","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_ANDROID","2013-08-31 13:39:55.0","2013-10-16 13:58:00.0",0,218,4,93,1377907200000
"2","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_ANDROID","2013-08-31 13:39:55.0","2013-10-16 13:58:00.0",0,390,5,157,1377993600000
"3","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_ANDROID","2013-08-31 13:39:55.0","2013-10-16 13:58:00.0",0,376,5,193,1.37808e+12
"4","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_ANDROID","2013-08-31 13:39:55.0","2013-10-16 13:58:00.0",1,35,1,15,1377907200000
"5","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_ANDROID","2013-08-31 13:39:55.0","2013-10-16 13:58:00.0",12,11258,117,2843,1377993600000
"6","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_ANDROID","2013-08-31 13:39:55.0","2013-10-16 13:58:00.0",5,4659,56,1826,1.37808e+12
"7","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_ANDROID","2013-08-31 13:39:55.0","2013-10-16 13:58:00.0",7,7296,136,2684,1377907200000
"8","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_IOS_IPAD","2013-08-31 13:18:21.0","2013-10-16 13:58:00.0",0,4533,35,1632,1377907200000
"9","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_IOS_IPAD","2013-08-31 13:18:21.0","2013-10-16 13:58:00.0",0,421,6,161,1377993600000
"10","Client1","Name2","*Name3_Name1_KB_MobApp_M-13-44_AU_PI Likes by KB_IOS_IPAD","2013-08-31 13:18:21.0","2013-10-16 13:58:00.0",0,57,2,23,1.37808e+12

Example row:

Name    Name1   *XYZ_Name3_KB_MobApp_M-18-25_AU_PI ANDROID  2013-09-32 14:39:55.0   2013-10-16 13:58:00.0   0   218 4   93  1377907200000

So it's easy enough to read in

results <- read.delim("~/results", header=F)

but then I still have the string *XYZ_Name3_KB_MobApp_M-18-25_AU_PI

Desired output(separate by _ and by -):

Name    Name1   *XYZ   Name3  KB   MobApp   M 18 25  AU  PI ANDROID 2013-09-32 14:39:55.0   2013-10-16 13:58:00.0   0   218 4   93  1377907200000

but not split up the time string.

---- Thanks @Henrik and @AnandaMahto for the code and package. ----

library(splitstackshape)

# split concatenated column by `_`
df4 <- concat.split(data = df3, split.col = "V3", sep = "_", drop = TRUE)

# split the remaining concatenated part by `-`
df5 <- concat.split(data = df4, split.col = "V3_5", sep = "-", drop = TRUE)

Solution

  • Try this:

    # dummy data
    df <- read.table(text="
    Name    Name1   *XYZ_Name3_KB_MobApp_M-18-25_AU_PI ANDROID  2013-09-32 14:39:55.0   2013-10-16 13:58:00.0   0   218 4   93  1377907200000
    Name    Name2   *CCC_Name3_KB_MobApp_M-18-25_AU_PI ANDROID  2013-09-32 14:39:55.0   2013-10-16 13:58:00.0   0   218 4   93  1377907200000
    ", as.is = TRUE)
    
    # replace "_" to "-"
    df_V3 <- gsub(pattern="_", replacement="-", df$V3, fixed = TRUE)
    
    # strsplit, make dataframe
    df_V3 <- do.call(rbind.data.frame, strsplit(df_V3, split = "-"))
    
    # output, merge columns
    output <- cbind(df[, c(1:2)],
                    df_V3,
                    df[, c(4:ncol(df))])
    

    Building on the comments below, here is another related option, but one which uses read.table instead of strsplit.

    splitCol <- "V3"
    temp <- read.table(text = gsub("-", "_", df[, splitCol]), sep = "_")
    names(temp) <- paste(splitCol, seq_along(temp), sep = "_")
    cbind(df[setdiff(names(df), splitCol)], temp)