I have two dataframes (df_1 and df_2) containing one variable each (var1 and var2). They are both global maps, in the same crs and at the same spatial resolution (0.083 on the global grid):
Var1
Var2.
If it can help, I created both dataframes starting from NetCDF files at different spatial resolutions and extents, using package terra
.
I need the two variables to be in the same dataframe to perform analyses on them. However, I've been trying to merge the dataframes based on spatial coordinates (columns lon and lat) but it returns a column of NAs.
join
from dplyr
:> df <- df_1 %>%
+ left_join(df_2, by = c("lon", "lat"))
> summary(df)
lon lat var1 var2
Min. :-179.96 Min. :-55.458 Min. : 0.000 Min. : NA
1st Qu.: -63.04 1st Qu.: 8.208 1st Qu.: 9.398 1st Qu.: NA
Median : 31.21 Median : 37.791 Median :15.052 Median : NA
Mean : 20.81 Mean : 30.864 Mean :16.223 Mean :NaN
3rd Qu.: 93.12 3rd Qu.: 56.791 3rd Qu.:21.898 3rd Qu.: NA
Max. : 179.96 Max. : 83.041 Max. :62.238 Max. : NA
NA's :2075418
sf
and then used st_join
:> sf_1 <- st_as_sf(df_1,
+ coords = c("lon", "lat"),
+ crs = 4326)
> sf_2 <- st_as_sf(df_2,
+ coords = c("lon", "lat"),
+ crs = 4326)
> df_spatialjoin <- st_join(sf_1, sf_2)
> summary(df_spatialjoin)
var1 var2 geometry
Min. : 0.000 Min. : NA POINT :2075418
1st Qu.: 9.398 1st Qu.: NA epsg:4326 : 0
Median :15.052 Median : NA +proj=long...: 0
Mean :16.223 Mean :NaN
3rd Qu.:21.898 3rd Qu.: NA
Max. :62.238 Max. : NA
NA's :2075418
sf
back to tibble and then use dplyr
:> df_join <- left_join(as_tibble(sf_1),
+ as_tibble(sf_2),
+ by = "geometry")
> summary(df_join)
var1 geometry var2
Min. : 0.000 POINT :2075418 Min. : NA
1st Qu.: 9.398 epsg:4326 : 0 1st Qu.: NA
Median :15.052 +proj=long...: 0 Median : NA
Mean :16.223 Mean :NaN
3rd Qu.:21.898 3rd Qu.: NA
Max. :62.238 Max. : NA
NA's :2075418
I also tried geo_join
from fuzzyjoin
as explained in this thread but it also didn't work.
I had this problem in the past, but I would solve it using cbind()
instead of any join()
function. Unfortunately, here cbind()
won't work as the two dataframes originate from different NetCDF files and have thus different row numbers.
This worked for me. I think there was a problem with the original maps with which the dataframes were created. So I did:
library(terra)
library(tidyterra)
library(tidyverse)
# transform dataframes back to maps
map_1 <- as_spatraster(df_1, xycols = 1:2, crs = 4326, digits = 6)
map_2 <- as_spatraster(df_2, xycols = 1:2, crs = 4326, digits = 6)
plot(map_int)
# resample one dataframe to match the other
map_1 <- terra::resample(map_1, map_2)
# convert both maps back to dataframes
df_1 <- terra::as.data.frame(map_1, xy = TRUE)
df_2 <- terra::as.data.frame(map_2, xy = TRUE)
# merge with dplyr
df <- df_1 %>%
inner_join(df_2,
by = c("x", "y"))
I hope this could help other people that had the same problem.