I am trying to edit a table with data collected from a google form. In several columns there is data about weekly amounts of physical activity in minutes and hours. Except numeric values there is also text in cells representing minutes or hours of physical activity. The goal is to have only numeric values in these cells without the text and to have only minutes (i.e. hours need to be converted).
Example of a column with respective data
I've tried with the following formula to get rid of the text which works just fine for this part of the problem: =SUBSTITUTE(SUBSTITUTE(J52;"min";"");"h";"")
There is a problem when I am trying to nest the formula and get only minutes in cells as there are minutes (represented as: min), hours (represented as: h) and some values that have a > sign in front (e.g. > 4 hour is determined a priori to be 4:30, that is 270 minutes).
In order to potentially solve this in R, I am providing session info along with a reproducible example: R version 4.0.4 (2021-02-15) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
1 LC_COLLATE=Croatian_Croatia.1250 LC_CTYPE=Croatian_Croatia.1250 LC_MONETARY=Croatian_Croatia.1250
[4] LC_NUMERIC=C LC_TIME=Croatian_Croatia.1250
system code page: 1252
attached base packages: 1 stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
1 rstudioapi_0.13 knitr_1.33 magrittr_2.0.1 tidyselect_1.1.1 munsell_0.5.0 colorspace_2.0-0
[7] R6_2.5.0 rlang_0.4.11 fansi_0.4.2 dplyr_1.0.5 tools_4.0.4 grid_4.0.4
[13] gtable_0.3.0 xfun_0.22 tinytex_0.31 utf8_1.2.1 cli_2.5.0 DBI_1.1.1
[19] htmltools_0.5.1.1 ellipsis_0.3.1 digest_0.6.27 yaml_2.2.1 assertthat_0.2.1 tibble_3.1.1
[25] lifecycle_1.0.0 crayon_1.4.1 purrr_0.3.4 ggplot2_3.3.3 vctrs_0.3.7 evaluate_0.14
[31] glue_1.4.2 rmarkdown_2.7 compiler_4.0.4 pillar_1.6.0 generics_0.1.0 scales_1.1.1
[37] pkgconfig_2.0.3
data <- structure(list(id = c("100213", "100998", "100494", "100758",
"100984", "100461", "100374", "140859", "150950", "140433", "200481",
"200892", "300487", "300319", "400258", "400075", "200712", "120114",
"500046", "500649", "600614", "700001", "300812", "700986", "800665",
"700882", "800470", "400090", "900350", "700283", "600455", "800400",
"100814", "700027", "600612", "30076", "500671", "200052", "600675",
"800786", "110092", "800193", "800947", "500749", "800665", "400562",
"150066", "200013", "700419", "400468", "600584", "ID 600366",
"300379", "300035", "150346", "140860", "900508", "600751", "110825",
"300848", "200990", "100237", "500137", "400751", "600347", "700289",
"700000", "150320", "700319", "300643", "400871", "600963", "400295",
"120120", "200875", "500209", "700380", "600646", "150415", "150415",
"600640", "120999", "200693", "600626", "700003", "400493", "700849",
"200544", "200001", "700801", "200084", "110951", "100371", "120114"
), vpa = c("0 min", "", "15 min", "0 min", "0 min", "45 min",
"15 min", "0 min", "30 min", "30 min", "0 min", "0 min", "0 min",
"0 min", "45 min", "3:00 h", "0 min", "", "15 min", "0 min",
"1:00 h", "0 min", "45 min", "0 min", "0 min", "30 min", "1:00 h",
"0 min", "1:00 h", "2:15 h", "30 min", "15 min", "0 min", "0 min",
"0 min", "0 min", "0 min", "45 min", "1:00 h", "0 min", "1:30 h",
"1:00 h", "0 min", "45 min", "0 min", "0 min", "0 min", "0 min",
"15 min", "", "1:00 h", "0 min", "1:00 h", "30 min", "30 min",
"30 min", "0 min", "30 min", "30 min", "45 min", "0 min", ">4 h",
"15 min", "0 min", "45 min", "30 min", "30 min", "0 min", "1:30 h",
"15 min", "45 min", "0 min", "15 min", "0 min", "0 min", "0 min",
"0 min", "30 min", "2:00 h", "2:00 h", "15 min", "", "0 min",
"0 min", "0 min", "0 min", "0 min", "0 min", "0 min", "0 min",
"1:30 h", "15 min", "0 min", "4:00 h")), class = "data.frame", row.names = c(NA,
-94L))
The expected output would need to look like this:
data <- structure(list(id = c("100213", "100998", "100494", "100758",
"100984", "100461", "100374", "140859", "150950", "140433", "200481",
"200892", "300487", "300319", "400258", "400075", "200712", "120114",
"500046", "500649", "600614", "700001", "300812", "700986", "800665",
"700882", "800470", "400090", "900350", "700283", "600455", "800400",
"100814", "700027", "600612", "30076", "500671", "200052", "600675",
"800786", "110092", "800193", "800947", "500749", "800665", "400562",
"150066", "200013", "700419", "400468", "600584", "ID 600366",
"300379", "300035", "150346", "140860", "900508", "600751", "110825",
"300848", "200990", "100237", "500137", "400751", "600347", "700289",
"700000", "150320", "700319", "300643", "400871", "600963", "400295",
"120120", "200875", "500209", "700380", "600646", "150415", "150415",
"600640", "120999", "200693", "600626", "700003", "400493", "700849",
"200544", "200001", "700801", "200084", "110951", "100371", "120114"
), vpa_2 = c("0", "", "15", "0", "0", "45",
"15", "0", "30", "30", "0", "0", "0",
"0", "45", "180", "0", "", "15", "0",
"60", "0", "45", "0", "0", "30", "60",
"0", "60", "135", "30", "15", "0", "0",
"0", "0", "0", "45", "60", "0", "90",
"60", "0", "45", "0", "0", "0", "0",
"15", "", "60", "0", "60", "30", "30",
"30", "0", "30", "30", "45", "0", "270",
"15", "0", "45", "30", "30", "0", "90",
"15", "45", "0", "15", "0", "0", "0",
"0", "30", "120", "120", "15", "", "0",
"0", "0", "0", "0", "0", "0", "0",
"90", "15", "0", "240")), class = "data.frame", row.names = c(NA, -94L))
It is doable using base R. I put the solution first:
# make a copy of your input data
test <- data
# remove the letters from the time string
for (i in 1:nrow(test)){
test$vpa[[i]] <- gsub("min|h"," ",test$vpa[[i]])
}
# if a string is in the hour unit (strings appearing as "xx:yy"), replace its value with the calculated minutes
for (i in 1:nrow(test)){
if (grepl(":",test$vpa[[i]]) == TRUE){
test$vpa[[i]] <- as.numeric(unlist(strsplit(test$vpa[[i]],":"))[1])*60 + as.numeric(unlist(strsplit(test$vpa[[i]],":"))[2])
}
}
# check results
# The result "test" will look like your expected output
print(test)
I am not using any packages here, but you may need some knowledge of gsub
,grepl
and str_split
to understand what I did. Basically, my logic is to play around the strings
.