For example I have some data
x | y | Owner | H |
---|---|---|---|
1 | 2 | A | 1 |
2 | 5 | A | NA |
2 | 2 | A | 5 |
2 | 5 | A | 4 |
2 | 5 | B | 4 |
2 | 5 | C | 3 |
3 | 3 | C | 6 |
3 | 3 | C | NA |
3 | 3 | D | NA |
I want to replace the Nans with the most common/mode when you group by (x,y,Owner) to get
x | y | Owner | H |
---|---|---|---|
1 | 2 | A | 1 |
2 | 5 | A | 4 |
2 | 2 | A | 5 |
2 | 5 | A | 4 |
2 | 5 | B | 4 |
2 | 5 | C | 3 |
3 | 3 | C | 6 |
3 | 3 | C | 6 |
3 | 3 | D | NA |
I tried this but I think its replacing everything when I only want to replace the Nas new_data <- my_data %>% group_by('Owner', 'x', 'y') %>% mutate(H=mean(H, na.rm=TRUE))
I've seen this: but it doesn't workhttps://stackoverflow.com/questions/45741879/can-i-replace-nans-with-the-mode-of-a-column-in-a-grouped-data-frame
new_data <- my_data %>% group_by('Owner', 'x', 'y') %>% mutate(H=mean(H, na.rm=TRUE)
But this replaces all the values of H with the mean. Also sometimes there will be no rows which match ie the group will only have 1 row and that has an Na (or maybe that its more than one but they just have NAs), is there a way to just keep the NA?
(I want it to find all the rows which match theOwner, x and y of the row with the NA and take the mode of their H values)
Use the Mode
function defined here and currently included in the gtools
package
df %>%
group_by(Owner, x, y) %>%
mutate(H = replace_na(H, gtools::stat_mode(H, ties = 'first')))
# A tibble: 8 × 4
# Groups: Owner, x, y [6]
x y Owner H
<int> <int> <chr> <int>
1 1 2 A 1
2 2 5 A 4
3 2 2 A 5
4 2 5 A 4
5 2 5 B 4
6 2 5 C 3
7 3 3 C 6
8 3 3 C 6