Ultimately, I need to create the following table:
ID Age Gender
1 25 F
2 14 M
and so on. . .
I got my data in many txt files, where the ID is the name of the file. Age and Gender are separated by new line, for example:
Age: 25
Gender: F
Now, I merge all the files into a one data table with the file path as the ID column (later I'll get rid of the whole path with stringr)
all_test <- list.files(path = "my/file/path", full.names=T)
dtt <- rbindlist( sapply(all_test, fread, simplify = FALSE, sep=":", sep2=" "),use.names = TRUE, idcol = "ID" )
And here is my problem - I'm getting the following table:
ID V1 V2
1 Age 25
2 Gender F
I tried using sep = "\n"
but it did nothing. How can I tell R each variable is in a different line? (preferably with data.table)
A possible approach is to reshape into wide format with dcast
and use type.convert
to get the columns into the right classes:
DTnew <- dcast(DT, ID ~ V1, value.var = 'V2')[, (2:3) := lapply(.SD, type.convert), .SDcols = 2:3][]
which gives:
> DTnew ID Age Gender 1: 1 25 F 2: 2 14 M > str(DTnew) Classes ‘data.table’ and 'data.frame': 2 obs. of 3 variables: $ ID : int 1 2 $ Age : int 25 14 $ Gender: Factor w/ 2 levels "F","M": 1 2 - attr(*, ".internal.selfref")=<externalptr> - attr(*, "sorted")= chr "ID"
Example data:
DT1 <- fread('Age: 25
Gender: F', sep = ':')
DT2 <- fread('Age: 14
Gender: M', sep = ':')
DT <- rbindlist(list(DT1, DT2), idcol = 'ID')