Search code examples
rdplyrgeospatialr-sf

sf & dplyr: Grouped mean by same coordinates doesn't work


I have a data set extracted from S2 satellite rasters (10 x 10 meters) with 12 values (ras.df.ll), but 6 is in a tile (T21JYG) and the second in another (T21JYG). I'd like to calculate the mean in the same (x,y coordinates) between tiles without success. I don't find any way to recognise 1st row in the first tile is the same coordinate of 1st row in the second tile just the end of my data set. In my example:

library(sf)
library(sfheaders)
library(dplyr)

# Raster (10x10 meters) info in data frame 1 - crs +proj=utm +zone=21 +south +datum=WGS84 +units=m +no_defs
x <-c(789385,789395,789405,789415,789425,789435)
y <-c(6626865,6626865,6626865,6626865,6626865,6626865)
tile <- rep("T21JYG",6)
values <-c(321,249,234,238,224,244)
ras.ds1<-data.frame(x,y,tile,values)
ras.ds1.sf <- st_as_sf(ras.ds1, coords = c("x", "y"), crs = 32721, agr = "constant")
ras.ds1.sf.ll <- st_transform(ras.ds1.sf, crs=4326)
ras.ds1.sf.ll
#Simple feature collection with 6 features and 2 fields
#Attribute-geometry relationship: 2 constant, 0 aggregate, 0 identity
#Geometry type: POINT
#Dimension:     XY
#Bounding box:  xmin: -53.98638 ymin: -30.45564 xmax: -53.98586 ymax: -30.45562
#Geodetic CRS:  WGS 84
#    tile values                    geometry
# 1 T21JYG    321 POINT (-53.98638 -30.45564)
# 2 T21JYG    249 POINT (-53.98628 -30.45563)
# 3 T21JYG    234 POINT (-53.98617 -30.45563)
# 4 T21JYG    238 POINT (-53.98607 -30.45563)
# 5 T21JYG    224 POINT (-53.98596 -30.45563)
# 6 T21JYG    244 POINT (-53.98586 -30.45562)

# Raster (10x10 meters) info in data frame - crs +proj=utm +zone=22 +south +datum=WGS84 +units=m +no_defs 
x <-c(213285,213295,213305,213315,213325,213335)
y <-c(6626955,6626955,6626955,6626955,6626955,6626955)
tile <- rep("T22JBM",6)
values <-c(336,355,363,426,341,308)
ras.ds2 <-data.frame(x,y,tile,values)
ras.ds2.sf <- st_as_sf(ras.ds2, coords = c("x", "y"), crs = 32722, agr = "constant")
ras.ds2.sf.ll <- st_transform(ras.ds2.sf, crs=4326)
ras.ds2.sf.ll
# Simple feature collection with 6 features and 2 fields
# Attribute-geometry relationship: 2 constant, 0 aggregate, 0 identity
# Geometry type: POINT
# Dimension:     XY
# Bounding box:  xmin: -53.98638 ymin: -30.45564 xmax: -53.98586 ymax: -30.45562
# Geodetic CRS:  WGS 84
#     tile values                    geometry
# 1 T21JYG    321 POINT (-53.98638 -30.45564)
# 2 T21JYG    249 POINT (-53.98628 -30.45563)
# 3 T21JYG    234 POINT (-53.98617 -30.45563)
# 4 T21JYG    238 POINT (-53.98607 -30.45563)
# 5 T21JYG    224 POINT (-53.98596 -30.45563)
# 6 T21JYG    244 POINT (-53.98586 -30.45562)


# Join information
ras.ds.sf.ll <- rbind(ras.ds1.sf.ll, ras.ds2.sf.ll)
ras.df.ll <- sf_to_df(ras.ds.sf.ll, fill = TRUE, unlist = NULL)


# Mean values by tile 
ras.df.ll %>% 
  group_by(x,y) %>% dplyr::summarise(values=mean(values))
# `summarise()` has grouped output by 'x'. You can override using the `.groups` argument.
# # A tibble: 12 x 3
# # Groups:   x [12]
#        x     y values
#    <dbl> <dbl>  <dbl>
#  1 -54.0 -30.5    321
#  2 -54.0 -30.5    249
#  3 -54.0 -30.5    234
#  4 -54.0 -30.5    238
#  5 -54.0 -30.5    224
#  6 -54.0 -30.5    244
#  7 -54.0 -30.5    336
#  8 -54.0 -30.5    355
#  9 -54.0 -30.5    363
# 10 -54.0 -30.5    426
# 11 -54.0 -30.5    341
# 12 -54.0 -30.5    308


# Nothing happened!!

# But if I try to change x and y values using accuracy
round_any = function(x, accuracy, f=round){f(x/ accuracy) * accuracy}
ras.df.ll2 <- ras.df.ll %>%
  mutate(x = round_any(x, accuracy = 0.001),
         y = round_any(y, accuracy = 0.001))
ras.df.ll2 %>% 
  group_by(x,y) %>% dplyr::summarise(values=mean(values))
`summarise()` has grouped output by 'x'. You can override using the `.groups` argument.
# # A tibble: 3 x 3
# # Groups:   x [2]
#       x     y values
#   <dbl> <dbl>  <dbl>
# 1 -54.0 -30.5   252.
# 2 -54.0 -30.5   370 
# 3 -54.0 -30.5   324.

# Hapens something but is wrong!!

Please, is there any way to make this corrected mean extraction? Thanks in advance Alexandre


Solution

  • Let's start by noticing that what you wrote is unnecessarily complicated. I suggest you do it like this.

    fst = function(data)
      st_as_sf(data, coords = c("x", "y"),
               crs = data$crs, agr = "constant") %>%
      st_transform(crs=4326) %>%
      sf_to_df(fill = TRUE, unlist = NULL)
    
    df = tibble(
        tile = rep(c("T21JYG", "T22JBM"), each = 6) %>% fct_inorder(),
        values  = c(321,249,234,238,224,244,336,355,363,426,341,308),
        x = c(789385,789395,789405,789415,789425,789435,
              213285,213295,213305,213315,213325,213335),
        y = c(6626865,6626865,6626865,6626865,6626865,6626865,
              6626955,6626955,6626955,6626955,6626955,6626955),
        crs = rep(c(32721, 32722), each = 6)
      ) %>% group_by(tile) %>%
        nest(data=x:crs) %>%
        mutate(st = map(data, ~ fst(.x))) %>%
        unnest(st) %>% 
        mutate(
          x = x %>% plyr::round_any(accuracy = 0.001) %>% paste(),
          y = y %>% plyr::round_any(accuracy = 0.001) %>% paste(),
        ) %>% group_by(x,y) 
    
    

    output

    # A tibble: 12 x 8
    # Groups:   x, y [3]
       tile   values data               crs sfg_id point_id x       y      
       <fct>   <dbl> <list>           <dbl>  <int>    <int> <chr>   <chr>  
     1 T21JYG    321 <tibble [1 x 3]> 32721      1        1 -53.986 -30.456
     2 T21JYG    249 <tibble [1 x 3]> 32721      1        1 -53.986 -30.456
     3 T21JYG    234 <tibble [1 x 3]> 32721      1        1 -53.986 -30.456
     4 T21JYG    238 <tibble [1 x 3]> 32721      1        1 -53.986 -30.456
     5 T21JYG    224 <tibble [1 x 3]> 32721      1        1 -53.986 -30.456
     6 T21JYG    244 <tibble [1 x 3]> 32721      1        1 -53.986 -30.456
     7 T22JBM    336 <tibble [1 x 3]> 32722      1        1 -53.986 -30.455
     8 T22JBM    355 <tibble [1 x 3]> 32722      1        1 -53.986 -30.455
     9 T22JBM    363 <tibble [1 x 3]> 32722      1        1 -53.986 -30.455
    10 T22JBM    426 <tibble [1 x 3]> 32722      1        1 -53.986 -30.455
    11 T22JBM    341 <tibble [1 x 3]> 32722      1        1 -53.985 -30.455
    12 T22JBM    308 <tibble [1 x 3]> 32722      1        1 -53.985 -30.455
    

    However, I completely do not understand what you see incorrect in it

    df %>% dplyr::summarise(values=mean(values))
    

    output

    # A tibble: 3 x 3
    # Groups:   x [2]
      x       y       values
      <chr>   <chr>    <dbl>
    1 -53.985 -30.455   324.
    2 -53.986 -30.455   370 
    3 -53.986 -30.456   252.
    

    This is exactly what we do, which is the mean of values in the x, y groups. Make clear what you want to achieve. And don't write it in the comments, but in the body of the post!

    Hmm. I don't know anything about the functions st_as_sf, st_transform, sf_to_df. I don't know what they do or how to interpret their results. Note, however, that they are placed in a pipeline (in my fst function) so they do the necessary calculations and then pass the results to each other. These results must therefore be correct.

    Perhaps the problem is that you originally specified two different crs parameters in the st_as_sf function. I did this by putting this variable in tibble. However, you only pass one value of crs = 4326 to the st_transform function. Perhaps two different values should also be conveyed here?

    Finally, I got six results when I set the accuracy in the round_any function to 0.00025.

    Take a look at these answers.

    # A tibble: 6 x 3
    # Groups:   x [6]
      x         y         values
      <chr>     <chr>      <dbl>
    1 -53.98525 -30.4555    308 
    2 -53.9855  -30.4555    377.
    3 -53.98575 -30.4555    312.
    4 -53.986   -30.45575   231 
    5 -53.98625 -30.45575   242.
    6 -53.9865  -30.45575   321 
    

    In conclusion, I showed you the correct way to program it.