I'm working on code to batch process excel worksheets, for import into a relational database. Each excel sheet represents data for a different elephant family, with a set of individuals and their presence/absence on different dates. I need this to be generalisable code because I have 50+ sheets for each year, and >10 years to import.
The number of individuals in a family varies, as does the number of dates they were observed. I need to transpose each tibble element to allow me to replace 1s with the individual codes (already been answered on StackOverflow), which I can then re-gather to a single list for each family, as below.
Data currently in excel;
Ind Date1 Date2 Date3 Date4
A 1 1 1
B 1 1
C 1 1
D 1 1
And I'm trying to get it to;
Date1 A
Date1 B
Date1 C
Date1 D
Date2 A
Date2 B
Date3 C
Date3 D
Date4 A
I think I need a for loop to do this, because each element varies in length, so each of my efforts with map*() gather() or t() have failed.
"mysheets" is a list of 50 tibbles, one for each family, the largest of which is 60 rows and 93 columns; an example
dput(head(mysheets, 4))
'list(AA = structure(list(Date = c("Famsize", "Grpsize", "ALY68",
"AME16", "AME12", "AME99", "AME90", "ANN12", "ANN03", "ALF16",
"AME81", "ANH16", "ANH11", "ALI79", "AST97", "ALI98", "ART14",
"ART10", "ALI02", "ARD13", "ALI12", "AGA82", "ALT14", "ALT02",
"AGA93", "ALX15", "ALX11", "AMY85", "ANG15", "ANG11", "AMB10",
"AUD94", "ABR12", "ART17"), `42761` = c(4, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 1, 1, 1, 1, NA, NA, NA), `42767` = c(12,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, 1, 1, 1, 1,
1, 1, 1, NA, NA, NA, NA, NA, NA, 1, 1, 1, 1, NA, NA, NA), `42770` = c(15,
NA, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, NA, NA, 1, 1, 1, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, 1, NA), `42773` = c(20,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, NA), `42777` = c(6,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, 1, 1, 1, 1, 1, NA),
`42782` = c(6, 7, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, 1, 1, 1, 1, 1, NA,
NA, NA, NA, NA, NA, NA), `42802...8` = c(6, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, 1, 1, 1, 1, 1, 1, NA), `42802...9` = c(8,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, 1, 1,
1, 1, 1, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA), `42809` = c(3, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 1, 1, 1, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA), `42816` = c(22, NA,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, NA, NA, NA, 1, 1,
1, NA, NA, NA, NA, NA, NA, 1, 1, 1, 1, 1, 1, NA), `42850...12` = c(8,
NA, 1, 1, 1, 1, NA, NA, NA, 1, 1, 1, 1, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA), `42850...13` = c(14, 16, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 1, 1, 1, 1, 1, 1, 1, 1, NA, NA, NA, NA, NA,
NA, 1, 1, 1, 1, 1, 1, NA), `42859` = c(2, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, 1, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA),
`42860...15` = c(2, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 1, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA), `42860...16` = c(6, 14,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, 1, 1, NA, 1, 1,
NA), `42862` = c(8, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, 1, 1, NA, NA, NA, 1, 1, 1, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 1, 1, 1, NA), `42864` = c(3, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, 1, 1, 1, NA, NA, NA, NA), `42866` = c(6,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1,
1, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, 1, 1, NA, NA,
NA, NA), `42870` = c(8, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 1, NA, NA, NA, 1, 1, 1, 1, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 1, 1, NA), `42880` = c(6, 11, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, NA, 1, 1, 1, NA, NA,
1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), `42784...22` = c(8,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, 1, 1,
1, 1, 1, 1, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA), `42784...23` = c(2, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 1, 1, NA), `42823` = c(8,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1,
1, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, 1, 1, NA, 1,
1, NA), `42817` = c(6, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, 1, 1, 1, 1,
1, NA, NA, NA, NA, NA, NA, NA), `42896` = c(6, 16, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 1, 1, 1, 1, 1, 1, NA), `42933...27` = c(14,
27, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, 1, 1,
1, 1, 1, 1, 1, NA, NA, NA, NA, NA, NA, 1, 1, 1, 1, 1, 1,
NA), `43057` = c(7, NA, NA, NA, NA, 1, 1, 1, 1, NA, 1, NA,
1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA), `43082` = c(7, NA, NA, NA, NA,
1, 1, 1, 1, NA, 1, NA, 1, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), `42928` = c(7,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, NA, NA,
NA, NA, 1, 1, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, 1,
1, NA), `42933...31` = c(11, 24, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 1, 1, NA, NA, NA, 1, 1, 1, NA, NA, NA,
NA, NA, NA, 1, 1, 1, 1, 1, 1, NA), `42935...32` = c(3, 21,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1,
1, NA), `42935...33` = c(4, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 1, 1, 1, 1, NA, NA, NA), `42936` = c(11, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, 1, NA, NA,
NA, 1, 1, 1, NA, NA, NA, NA, NA, NA, 1, 1, 1, 1, 1, 1, NA
), `42949...35` = c(4, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 1, 1, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, 1, 1, NA), `42949...36` = c(3, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, 1,
1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA), `42952` = c(2, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 1, 1, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA), `43319` = c(5, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, 1, 1, NA, 1, 1, NA
), `42959...39` = c(6, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, 1, 1, 1, 1,
1, NA, NA, NA, NA, NA, NA, NA), `42959...40` = c(10, NA,
1, NA, 1, 1, 1, 1, 1, 1, 1, 1, 1, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
), `42966` = c(4, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 1, 1, 1, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 1, NA, NA, NA), `42978` = c(10, NA, 1, NA,
1, 1, 1, 1, 1, 1, 1, 1, 1, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), `42986` = c(2,
NA, 1, NA, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA), `42992...44` = c(6, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 1, 1, 1, 1, 1, 1, NA), `42992...45` = c(3,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1,
1, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA), `42997` = c(6, 10, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, 1, 1,
1, 1, 1, NA, NA, NA, NA, NA, NA, NA), `43007` = c(3, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, 1, NA, NA,
NA, NA, NA, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA), `43015` = c(6, 7, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, 1, NA, 1, 1, 1,
1, NA, NA, NA, NA, NA, NA, NA), `43046` = c(3, 14, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 1, 1, 1, NA, NA, NA, NA),
`41222` = c(3, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, 1, 1,
NA, NA, NA, NA, NA, NA, NA), `43048...51` = c(5, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, 1, 1, 1, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, NA, NA, NA
), `43048...52` = c(3, 7, NA, NA, 1, NA, NA, NA, NA, NA,
1, NA, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA), `43054` = c(5, 10, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
1, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, 1, NA
), `43068` = c(3, 6, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, 1, 1, 1, NA, NA, NA, NA), `43073` = c(8, 10, NA, NA,
1, 1, 1, 1, NA, NA, 1, NA, 1, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA),
`43076...56` = c(12, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, 1, NA, 1, 1, 1,
1, 1, 1, 1, NA, 1, 1, NA), `43076...57` = c(2, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, NA, 1, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
), `43085...58` = c(3, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, 1, 1, 1, NA, NA, NA, NA), `43085...59` = c(6, NA,
NA, NA, 1, 1, 1, 1, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA), `43092...60` = c(3, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, 1, NA, 1, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 1), `43092...61` = c(8,
9, NA, NA, 1, 1, 1, 1, 1, NA, 1, NA, 1, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA), `43093` = c(15, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 1, NA, 1, 1, 1, 1, 1, NA, 1, 1, 1, 1,
NA, NA, NA, 1, 1, 1, 1), `43099` = c(8, 26, NA, NA, 1, 1,
1, 1, 1, NA, 1, NA, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA,
-34L), class = c("tbl_df", "tbl", "data.frame")), AC = structure(list(
Date = c("Famsize", "Grpsize", "WAR67", "ABI13", "ABI05",
"AGA93", "AXA17", "AXA13", "ABI82", "ANW15", "ANW10", "WAR79",
"ANA12"), `42880` = c(2, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, 1, 1), `42888` = c(6, 14, NA, 1, NA, 1, NA, 1, 1,
1, 1, NA, NA), `42978...4` = c(3, 5, NA, NA, NA, NA, NA,
NA, 1, 1, 1, NA, NA), `42978...5` = c(3, 7, NA, 1, NA, 1,
NA, 1, NA, NA, NA, NA, NA), `42997` = c(6, 8, NA, 1, NA,
1, NA, 1, 1, 1, 1, NA, NA), `43007` = c(3, 4, NA, NA, NA,
NA, NA, NA, 1, 1, 1, NA, NA), `43025` = c(6, 11, NA, 1, NA,
1, NA, 1, 1, 1, 1, NA, NA), `43069` = c(2, 9, NA, NA, NA,
NA, NA, NA, NA, NA, NA, 1, 1), `43081` = c(3, NA, NA, NA,
NA, 1, 1, 1, NA, NA, NA, NA, NA), `43083` = c(4, NA, NA,
1, NA, 1, 1, 1, NA, NA, NA, NA, NA), `43087` = c(4, 6, NA,
1, NA, NA, NA, NA, 1, 1, 1, NA, NA), `43092` = c(3, 17, NA,
NA, NA, NA, NA, NA, 1, 1, 1, NA, NA), `43096` = c(7, NA,
NA, 1, NA, 1, 1, 1, 1, 1, 1, NA, NA), `43057` = c(4, 8, NA,
1, NA, NA, NA, NA, 1, 1, 1, NA, NA), `43082...16` = c(4,
NA, NA, 1, NA, 1, 1, 1, NA, NA, NA, NA, NA), `43082...17` = c("4",
"6", NA, NA, "?", NA, NA, NA, "1", "1", "1", NA, NA)), row.names = c(NA,
-13L), class = c("tbl_df", "tbl", "data.frame")), BB = structure(list(
Date = c("Famsize", "Grpsize", "BAR", "BAR01", "BDU14", "BAR87",
"BEC16", "BEC11", "BON83", "BRL11", "BON01", "BOL15", "BON93",
"BIL16", "BIL12", "BEV90", "BAA12", "BAA03", "BEV97", "BOD12",
"BRN96", "BLL15", "BLL10", "BEA00", "BEL87", "BOG16", "BOG11",
"BOG04", "Extra12F"), `42943` = c(3, NA, NA, NA, NA, 1, 1,
1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA), `43001` = c(9, 10, 1, 1, 1,
1, 1, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
1, 1, 1, NA, NA, NA, NA, NA, NA), `43008` = c(14, 16, 1,
1, 1, 1, 1, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, 1, 1, 1, 1, 1, 1, 1, 1, 1)), row.names = c(NA, -29L), class = c("tbl_df",
"tbl", "data.frame")), BB2 = structure(list(Date = c("Famsize",
"Grpsize", "BET70", "BNT12", "BNT05", "BNT83", "BRY15", "BRY11"
), `42761` = c(6, 17, 1, 1, 1, 1, 1, 1), `42786` = c(6, 7, 1,
1, 1, 1, 1, 1), `42865` = c(6, NA, 1, 1, 1, 1, 1, 1), `42866` = c(6,
NA, 1, 1, 1, 1, 1, 1), `42871` = c(6, NA, 1, 1, 1, 1, 1, 1),
`42944` = c(6, 10, 1, 1, 1, 1, 1, 1), `43099` = c(6, NA,
1, 1, 1, 1, 1, 1)), row.names = c(NA, -8L), class = c("tbl_df",
"tbl", "data.frame")))
transposed <- as.list(for(family in mysheets$family){
gather(family, na.rm = FALSE)
})
transposed generates a null result - no error is thrown but the object is empty
Can anyone help me understand how to transpose each tibble in the list, so that I can continue with the rest of the problem? Thanks
How about this (where X
is the structure from your question)?
I'm sure that it could become more polished (with map
) but here goes:
library(tidyverse)
AA <- X[[1]]
AC <- X[[2]]
BB <- X[[3]]
BB2 <- X[[4]]
data_new <- function(data, tag){
data %>%
filter(!Date %in% c('Famsize', 'Grpsize')) %>%
rename('EleID' = Date) %>%
gather(key = 'Date', value = 'Value', -EleID) %>%
filter(!is.na(Value)) %>%
select(-Value) %>%
mutate(dataset = tag)
}
AA_new <- data_new(AA, "AA")
AC_new <- data_new(AC, "AC")
BB_new <- data_new(BB, "BB")
BB2_new <- data_new(BB2, "BB")
data_combined <- bind_rows(AA_new, AC_new, BB_new, BB2_new)
... which generates:
glimpse(data_combined)
Observations: 537
Variables: 3
$ EleID <chr> "AMY85", "ANG15", "ANG11", "AMB10", "ALI79", "AST97", "ALI98…
$ Date <chr> "42761", "42761", "42761", "42761", "42767", "42767", "42767…
$ dataset <chr> "AA", "AA", "AA", "AA", "AA", "AA", "AA", "AA", "AA", "AA", …
And (after a little cleaning, as you say) I think you could mutate
, using the excel_numeric_to_date
function in the package janitor
in order to get R-type dates from Excel versions.
I hope that this helps you.