Search code examples
rdataframedplyrr-sfgeo

How to group aggregate data based of latitude and longitude


I have 2 dataframes in R that both have this type of structure:

Lon Lat Measurement
5 7 15
5 8 20

The numbers in the actual dataframes are different from this example but I don't think that should impact the question or response.

I want to find the average of the measurement column in a 10 meter by 10 meter area, and then take the ratio of the average in each area between the 2 dataframes. I have already been able to take the average in a 10m x 10m area for a single dataframe, but I've had trouble adding in the information from the second dataframe. Here is the code I have so far:

#some of the packages are here for plotting later
library(sf)
library(dplyr)
library(leaflet)
library(RColorBrewer)
library(osmdata)

# Example dataframes `df` and `df2`
# df <- data.frame(
#   lat = c(7, 8, 9),
#   lon = c(3, 3, 3),
#   measurement = c(10, 20, 30)  # Example measurement values
# )

# df2 <- data.frame(
#   lat = c(7.00001, 8.00001, 9.00001),
#   lon = c(3, 3, 3),
#   measurement = c(5, 15, 25)  # Example measurement values
# )

# Convert dataframes to sf objects with coordinates and CRS
df_sf <- st_as_sf(df, coords = c("lon", "lat"), crs = 4326)
df2_sf <- st_as_sf(df2, coords = c("lon", "lat"), crs = 4326)

# Transform to a metric CRS
df_sf <- st_transform(df_sf, crs = 32630)
df2_sf <- st_transform(df2_sf, crs = 32630)

# Define bounding box
bbox <- st_bbox(df_sf)

# Create a common grid of 10m x 10m cells
grid <- st_make_grid(df_sf, cellsize = c(10, 10), square = TRUE)
grid_sf <- st_sf(geometry = grid)

# Spatial join: assign points to grid cells
df_joined <- st_join(df_sf, grid_sf, join = st_intersects)
df2_joined <- st_join(df2_sf, grid_sf, join = st_intersects)

# Aggregate data within each grid cell for both dataframes
aggregated_data_df <- df_joined %>%
  group_by(geometry) %>%
  summarize(measurement_avg_df = mean(measurement, na.rm = TRUE), .groups = 'drop') %>%
  st_as_sf()

aggregated_data_df2 <- df2_joined %>%
  group_by(geometry) %>%
  summarize(measurement_avg_df2 = mean(measurement, na.rm = TRUE), .groups = 'drop') %>%
  st_as_sf()

# Perform a join to merge the two aggregated datasets based on the grid cell geometry
merged_data <- st_join(aggregated_data_df, aggregated_data_df2, by = "geometry")

# Compute the ratio
merged_data <- merged_data %>%
  mutate(ratio = measurement_avg_df / measurement_avg_df2)

# Compute center points of the merged data
merged_data <- st_centroid(merged_data)

# Convert the merged data back to the original CRS
merged_data <- st_transform(merged_data, crs = 4326)

# Extract longitude and latitude from the geometry column
merged_data <- merged_data %>%
  mutate(lon = st_coordinates(geometry)[, 1],
         lat = st_coordinates(geometry)[, 2]) %>%
  select(-geometry)

When I run this code, here is what the merged_data dataframe looks like after this code is run:

measurement_avg_df measurement_avg_df2 ratio lon lat geometry
5.5 NA NA 3 7 Point (3 7)
15.5 NA NA 3 8 Point (3 8)

What should I do to prevent the NA values from appearing in the merged table?


Solution

  • You could use a different geometry predicate function when calling sf_join() to create merged_data? For example if you do:

    # Perform a join to merge the two aggregated datasets based on the grid cell geometry
    merged_data <- st_join(aggregated_data_df, aggregated_data_df2, join=st_nearest_feature)
    

    instead of the default st_intersects, you get the following:

    Simple feature collection with 3 features and 5 fields
    Geometry type: POINT
    Dimension:     XY
    Bounding box:  xmin: 3 ymin: 7 xmax: 3 ymax: 9
    Geodetic CRS:  WGS 84
    # A tibble: 3 × 6
      measurement_avg_df measurement_avg_df2 ratio   lon   lat    geometry
                   <dbl>               <dbl> <dbl> <dbl> <dbl> <POINT [°]>
    1                 10                   5  2     3        7       (3 7)
    2                 20                  15  1.33  3        8       (3 8)
    3                 30                  25  1.2   3.00     9       (3 9)
    

    I don't know if moving to st_nearest_feature works with your real-world question or not.