when i use pivot_wider() on my data, there are some numeric values that are duplicates, forcing conversion to a list with nested values.
here's an example:
dfx <- data.frame("Arb_PersonID" = c(1,1,2,2,2,2),
"LabComponentName" = c("2ch", "4ch", "2ch", "4ch", "2ch", "4ch"),
"LabResult" = c(56, 55, 50, 54, 56, 60))
class(dfx1$`2ch`)
this shows:
> tibble(dfx)
# A tibble: 6 × 3
Arb_PersonID LabComponentName LabResult
<dbl> <chr> <dbl>
1 1 2ch 56
2 1 4ch 55
3 2 2ch 50
4 2 4ch 54
5 2 2ch 56
6 2 4ch 60
> class(dfx$LabResult)
[1] "numeric"
Then pivot_wider:
dfx1 <- dfx %>%
pivot_wider(names_from = LabComponentName, values_from = LabResult)
Arb_PersonID `2ch` `4ch`
1 56 55
2 c(50,56) c(54,60)
> class(dfx1$`2ch`)
[1] "list"
I'd like to use the average of the 2 values (ie 53 instead of c(50,56)) and keep the class numeric.
Thanks in advance for any help. I'm using a very large dataset of 14,000 rows and these duplicates are about 1/500. Manually reviewing and correcting is quite painful, but not impossible.
There's a handy values_fn
option within the pivot_wider
call which will allow you do simply do the mean whenever there are multiple values:
library(tidyverse)
dfx <- data.frame("Arb_PersonID" = c(1,1,2,2,2,2),
"LabComponentName" = c("2ch", "4ch", "2ch", "4ch", "2ch", "4ch"),
"LabResult" = c(56, 55, 50, 54, 56, 60))
dfx %>%
pivot_wider(names_from = LabComponentName, values_from = LabResult,
values_fn = mean)
#> # A tibble: 2 × 3
#> Arb_PersonID `2ch` `4ch`
#> <dbl> <dbl> <dbl>
#> 1 1 56 55
#> 2 2 53 57