I have a data set but it's kinda weird. So I'm trying to fix it right.
a_name b_name latitude longitude min.distance
kangseo 37.1562 126.9762 73.950928
kangbook 38.1255 126.1952 128.179185
jongro 37.8226 127.1724 95.626161
dongmoon 37.1161 127.1345 12.531519
jamsil 36.1215 127.9722 175.154161
nolboo 37.3213 127.3211 85.151616
ilsan 37.1111 127.1331 16.115151
jongone 38.1241 126.1215 95.626161
jongtwo 37.1161 127.1345 12.531519
jongthree 37.1562 126.9762 73.950928
jongfour 37.1551 127.6262 17.124115
jongfive 36.1515 127.9627 175.154161
This data is sepearated into two parts and I want to make it into clean data.
a_name b_name latitude longitude min.distance
jongthree kangseo 37.1562 126.9762 73.950928
kangbook 38.1255 126.1952 128.179185
jongone jongro 38.1241 126.1215 95.626161
jongtwo dongmoon 37.1161 127.1345 12.531519
jongfive jamsil 36.1515 127.9627 175.154161
nolboo 37.3213 127.3211 85.151616
jongfour ilsan 37.1515 127.6262 17.124115
I tried left_join and merge but it didn't work well.
left_join(A,B, by="min.distance")
I need your help.
We can group by 'min.distance' and modify the NA
elements to the non-NA values and get the distinct
rows without any need for joining
library(dplyr)
df1 %>%
group_by(min.distance) %>%
mutate(across(c(a_name, b_name), ~ (.x[!is.na(na_if(.x, ""))][1]))) %>%
ungroup %>%
distinct(min.distance, .keep_all = TRUE)
-output
# A tibble: 8 × 5
a_name b_name latitude longitude min.distance
<chr> <chr> <dbl> <dbl> <dbl>
1 jongthree kangseo 37.2 127. 74.0
2 <NA> kangbook 38.1 126. 128.
3 jongone jongro 37.8 127. 95.6
4 jongtwo dongmoon 37.1 127. 12.5
5 jongfive jamsil 36.1 128. 175.
6 <NA> nolboo 37.3 127. 85.2
7 <NA> ilsan 37.1 127. 16.1
8 jongfour <NA> 37.2 128. 17.1
df1 <- structure(list(a_name = c(NA, NA, NA, NA, NA, NA, NA, "jongone",
"jongtwo", "jongthree", "jongfour", "jongfive"), b_name = c("kangseo",
"kangbook", "jongro", "dongmoon", "jamsil", "nolboo", "ilsan",
NA, NA, NA, NA, NA), latitude = c(37.1562, 38.1255, 37.8226,
37.1161, 36.1215, 37.3213, 37.1111, 38.1241, 37.1161, 37.1562,
37.1551, 36.1515), longitude = c(126.9762, 126.1952, 127.1724,
127.1345, 127.9722, 127.3211, 127.1331, 126.1215, 127.1345, 126.9762,
127.6262, 127.9627), min.distance = c(73.950928, 128.179185,
95.626161, 12.531519, 175.154161, 85.151616, 16.115151, 95.626161,
12.531519, 73.950928, 17.124115, 175.154161)), class = "data.frame",
row.names = c(NA,
-12L))