I have a dataframe A consisting of 'name', 'measure'. I have another dataframe B consisting of 'type', 'measure'. I want to perform a summarise operation using each 'name' group of A on each 'type' group of B using the 'measure' field. For example:
A has:
Name | Measure
George 5
George 6
Tyrone 7
Tyrone 3
B has:
Type | Measure
cold 3
cold 2
hot 1
hot 5
I want to do a summarise on George and cold, George and hot, Tyrone and cold, Tyrone and hot, and in each summarise I find the minimum absolute difference (George on cold would be min(abs(5-3, 5-2, 6-3, 6-2)) = 2, and then find the 'Type' for each 'name' that has the lowest such score. How exactly do I do this for large datasets with many groups?
There's probably a simpler way, but you could do:
library(tidyverse)
crossing(
distinct(A, Name),
distinct(B, Type)
) %>%
left_join(A, by = 'Name') %>%
left_join(B, by = 'Type') %>%
group_by(Name, Type) %>%
summarise(minAbsDiff = min(abs(Measure.x - Measure.y))) %>%
group_by(Name) %>%
slice(which.min(minAbsDiff))
Output:
# A tibble: 2 x 3
# Groups: Name [2]
Name Type minAbsDiff
<fct> <fct> <int>
1 George hot 0
2 Tyrone cold 0