I am creating a raster layer
for an area with multiple environmental variables. All data formats have usually been netCDF files (arrays
) containing lat
, long
, date
and the variable in question - in this case sea_ice_fraction
.
The data for sea surface temperature (sst), came in an understandable format, at least from the point of view of trying to make a prediction grid:
, , Date = 2019-11-25
Long
Lat 294.875 295.125 295.375 295.625 295.875 296.125 296.375 296.625 296.875 297.125
-60.125 2.23000002 2.04 1.83 1.53 1.18 1.00 0.9800000 1.06 1.25 1.40999997
-60.375 2.06999993 1.79 1.60 1.31 1.09 0.97 1.0000000 1.15 1.30 1.42999995
-60.625 1.93999994 1.64 1.45 1.28 1.14 1.02 0.9899999 1.03 1.10 1.13000000
Each row is one single latitude coordinate (of the resolution of the data), and each column is a longitude coordinate paired with the date.
My goal is to calculate the mean of all the date-values for each coordinate cell. Which in the array
case is easy:
sst.c1 <- apply(sst.c1, c(1,2), mean)
Then project to a Raster
layer
However, the format of the sea ice data is in a dataframe, with 4 columns: lat
, long
, date
, and sea_ice_fraction
:
time lat lon sea_ice_fraction
<chr> <dbl> <dbl> <dbl>
1 2019-11-25T12:00:00Z -66.1 -65.1 0.580
2 2019-11-25T12:00:00Z -66.1 -65.1 NA
3 2019-11-25T12:00:00Z -66.1 -65.0 NA
4 2019-11-25T12:00:00Z -66.1 -65.0 NA
5 2019-11-25T12:00:00Z -66.1 -64.9 NA
How can I turn this dataframe
into an array
similar to the sst data? Or directly into a raster
finding the mean
of the values for the dates per cell in the dataframe
?
Can you not just do this using dplyr?
The following should work fine:
library(dplyr)
df %>%
group_by(lat, lon) %>%
summarize(sea_ice_fraction = mean(sea_ice_fraction)) %>%
ungroup()
should work fine