Search code examples
rdplyrnacoalesce

Combining two columns in order to get one column in R


I'm looking for a way to combine two columns into one column. The columns are mutually exclusive, so a valid value on one column means a NA in the other column.

structure(list(income_under_median = c(NA, "751.000 - 1.000.000", 
"751.000 - 1.000.000", "Below 451.000", NA, NA, NA, NA, "451.000 - 750.000", 
NA), income_above_median = c("2.501.000 - 3.000.000", NA, NA, 
NA, "Below 1.501.000", "Below 1.501.000", "2.001.000 - 2.500.000", 
"1.501.000 - 2.000.000", NA, "3.001.000 - 4.000.000")), row.names = c(NA, 
10L), class = "data.frame")

   income_under_median   income_above_median
1                 <NA> 2.501.000 - 3.000.000
2  751.000 - 1.000.000                  <NA>
3  751.000 - 1.000.000                  <NA>
4        Below 451.000                  <NA>
5                 <NA>       Below 1.501.000
6                 <NA>       Below 1.501.000
7                 <NA> 2.001.000 - 2.500.000
8                 <NA> 1.501.000 - 2.000.000
9    451.000 - 750.000                  <NA>
10                <NA> 3.001.000 - 4.000.000

I want to combine this into one column in order to get a single column for the net income which I can easily turn into an almost scale level.

I tried this according to this question, but I didn't get the result that I wanted:

lebanon$test <- paste(lebanon$income_under_median, lebanon$income_above_median)

 [1] "NA 2.501.000 - 3.000.000" "751.000 - 1.000.000 NA"   "751.000 - 1.000.000 NA"  
 [4] "Below 451.000 NA"         "NA Below 1.501.000"       "NA Below 1.501.000"      
 [7] "NA 2.001.000 - 2.500.000" "NA 1.501.000 - 2.000.000" "451.000 - 750.000 NA"    
[10] "NA 3.001.000 - 4.000.000"

Does anyone know a solution for this problem?

Greetings


Solution

  • One solution is using dplyr's coalesce function

    lebanon$test <- dplyr::coalesce(lebanon$income_under_median, lebanon$income_above_median)
    

    or, within a pipeline

    library(dplyr)
    lebanon %>%
      mutate(test = coalesce(income_under_median, income_above_median))
    

    Output

    #    income_under_median   income_above_median                  test
    # 1                 <NA> 2.501.000 - 3.000.000 2.501.000 - 3.000.000
    # 2  751.000 - 1.000.000                  <NA>   751.000 - 1.000.000
    # 3  751.000 - 1.000.000                  <NA>   751.000 - 1.000.000
    # 4        Below 451.000                  <NA>         Below 451.000
    # 5                 <NA>       Below 1.501.000       Below 1.501.000
    # 6                 <NA>       Below 1.501.000       Below 1.501.000
    # 7                 <NA> 2.001.000 - 2.500.000 2.001.000 - 2.500.000
    # 8                 <NA> 1.501.000 - 2.000.000 1.501.000 - 2.000.000
    # 9    451.000 - 750.000                  <NA>     451.000 - 750.000
    # 10                <NA> 3.001.000 - 4.000.000 3.001.000 - 4.000.000