Search code examples
rdataframemergeleft-join

How can I merge this unbalanced data in r?


I have a data set but it's kinda weird. So I'm trying to fix it right.

a_name     b_name         latitude      longitude        min.distance
           kangseo        37.1562        126.9762         73.950928
           kangbook       38.1255        126.1952         128.179185
           jongro         37.8226        127.1724         95.626161
           dongmoon       37.1161        127.1345         12.531519
           jamsil         36.1215        127.9722         175.154161
           nolboo         37.3213        127.3211         85.151616
           ilsan          37.1111        127.1331         16.115151
jongone                   38.1241        126.1215         95.626161
jongtwo                   37.1161        127.1345         12.531519
jongthree                 37.1562        126.9762         73.950928
jongfour                  37.1551        127.6262         17.124115
jongfive                  36.1515        127.9627         175.154161

This data is sepearated into two parts and I want to make it into clean data.

a_name     b_name         latitude      longitude        min.distance
jongthree  kangseo        37.1562        126.9762         73.950928
           kangbook       38.1255        126.1952         128.179185
jongone    jongro         38.1241        126.1215         95.626161
jongtwo   dongmoon        37.1161        127.1345         12.531519
jongfive   jamsil         36.1515        127.9627         175.154161
           nolboo         37.3213        127.3211         85.151616
jongfour   ilsan          37.1515        127.6262         17.124115
                                             

I tried left_join and merge but it didn't work well.

left_join(A,B, by="min.distance")

I need your help.


Solution

  • We can group by 'min.distance' and modify the NA elements to the non-NA values and get the distinct rows without any need for joining

    library(dplyr)
    df1 %>% 
       group_by(min.distance) %>%
       mutate(across(c(a_name, b_name), ~ (.x[!is.na(na_if(.x, ""))][1]))) %>% 
       ungroup %>%
       distinct(min.distance, .keep_all = TRUE)
    

    -output

    # A tibble: 8 × 5
      a_name    b_name   latitude longitude min.distance
      <chr>     <chr>       <dbl>     <dbl>        <dbl>
    1 jongthree kangseo      37.2      127.         74.0
    2 <NA>      kangbook     38.1      126.        128. 
    3 jongone   jongro       37.8      127.         95.6
    4 jongtwo   dongmoon     37.1      127.         12.5
    5 jongfive  jamsil       36.1      128.        175. 
    6 <NA>      nolboo       37.3      127.         85.2
    7 <NA>      ilsan        37.1      127.         16.1
    8 jongfour  <NA>         37.2      128.         17.1
    

    data

    df1 <- structure(list(a_name = c(NA, NA, NA, NA, NA, NA, NA, "jongone", 
    "jongtwo", "jongthree", "jongfour", "jongfive"), b_name = c("kangseo", 
    "kangbook", "jongro", "dongmoon", "jamsil", "nolboo", "ilsan", 
    NA, NA, NA, NA, NA), latitude = c(37.1562, 38.1255, 37.8226, 
    37.1161, 36.1215, 37.3213, 37.1111, 38.1241, 37.1161, 37.1562, 
    37.1551, 36.1515), longitude = c(126.9762, 126.1952, 127.1724, 
    127.1345, 127.9722, 127.3211, 127.1331, 126.1215, 127.1345, 126.9762, 
    127.6262, 127.9627), min.distance = c(73.950928, 128.179185, 
    95.626161, 12.531519, 175.154161, 85.151616, 16.115151, 95.626161, 
    12.531519, 73.950928, 17.124115, 175.154161)), class = "data.frame", 
    row.names = c(NA, 
    -12L))