The documentation for the dplyr package for mutating joins states that for using "by" that "A character vector of variables to join by" is needed.
This however does not seem to be the case.
It seems to work with a numeric or a dbl variable, i.e. column. By this I mean the common column (idenifier) that is used to join the dataframes.
This a glimpse of a portion of the two dataframes. nomem_encr is the common identifier.
df1
Rows: 2,525
Columns: 11
$ nomem_encr <dbl> 800054, 800170, 800186, 800204, 800228, 800274,
$ mj16a093 <dbl+lbl> 4, 4, 3, 4, 2, 5, 5, 3, 5, 3, 6,
$ mj16a094 <dbl+lbl> 3, 4, 2, 4, 2, 5, 5, 2, 3, 2, 6,
df2
Rows: 6,092
Columns: 3
$ nomem_encr <dbl> 800009, 800015, 800042, 800054, 800057, 800085,
$ cv16h101 <dbl+lbl> 2, 3, 5, 6, 7, 6, 0, 6, 5,
$ cv16h044 <dbl+lbl> 6, 7, 7, 8, 0, 7, 5, 4, 7, 8, 7,
df3 <- left_join(df1, df2, by = "nomem_encr")
Is it best to convert to a character, or does it not matter? My assumption was that the values just needed to be unique identifiers.
"by" : "A character vector of variables to join by"`
This means that when you write the join, your by
argument needs to be a character vector and NOT your column to be a character. for example
left_join(d1, d2, by = c('ID', 'Latitude'))
You are passing a character vector to the by
argument. The columns being numeric does not matter