I have the following data frame:
> head(df)
# A tibble: 6 x 6
# Groups: lat, decade [2]
lat long date year decade month_day
<dbl> <dbl> <date> <chr> <chr> <chr>
1 55 18 1952-02-03 1952 1950-1959 02-03
2 55 18 1958-02-08 1958 1950-1959 02-08
3 55 18 1958-02-08 1958 1950-1959 02-08
4 55 18 1958-02-08 1958 1950-1959 02-08
5 55 18 1965-02-07 1965 1960-1969 02-07
6 55 18 1966-03-03 1966 1960-1969 03-03
> summary(df)
lat long date year decade
Min. :55.00 Min. :18 Min. :1951-03-22 Length:1414 Length:1414
1st Qu.:56.00 1st Qu.:18 1st Qu.:1987-01-01 Class :character Class :character
Median :58.00 Median :18 Median :2004-04-02 Mode :character Mode :character
Mean :59.07 Mean :18 Mean :1999-02-16
3rd Qu.:62.00 3rd Qu.:18 3rd Qu.:2014-01-01
Max. :68.00 Max. :18 Max. :2021-03-28
Class :character
Mode :character
I would like to get the median month_day
by degree of latitude (lat
) and per decade
I have tried this but cannot get past an error:
df = df %>%
group_by(lat, decade) %>%
summarise(across(month_day, median)) %>%
Error in `summarise()`:
! Problem while computing `..1 = across(month_day, median)`.
Caused by error:
! `month_day` must return compatible vectors across groups.
i Result type for group 1 (lat = 55, decade = "1950-1959"): <double>.
i Result type for group 2 (lat = 55, decade = "1960-1969"): <character>.
I do not know how to solve it, thank you very much for your help.
> ds_filtered_median[ds_filtered_median$lat == '57', ]
# A tibble: 124 x 6
lat long date year decade month_day
<dbl> <dbl> <date> <chr> <chr> <chr>
1 57 18 1955-04-08 1955 1950-1959 04-08
2 57 18 1957-02-19 1957 1950-1959 02-19
3 57 18 1958-04-06 1958 1950-1959 04-06
4 57 18 1959-01-01 1959 1950-1959 01-01
5 57 18 1960-01-03 1960 1960-1969 01-03
6 57 18 1961-01-02 1961 1960-1969 01-02
7 57 18 1962-01-02 1962 1960-1969 01-02
8 57 18 1963-01-01 1963 1960-1969 01-01
9 57 18 1964-01-19 1964 1960-1969 01-19
10 57 18 1965-01-12 1965 1960-1969 01-12
# ... with 114 more rows
What you can do is convert your date to days since the start of a year. From that number you can easily calculate your median. Then convert your days back with any first of january as a reference. You can me one of on leap years though... For date manipulation I used lubridate.
data %>%
date = ymd(date),
days_since_january = as.numeric(date - ymd(paste(year(date), 1, 1, sep = "-")))
) %>%
group_by(lat, decade) %>%
summarise(across(days_since_january, median), .groups = "keep") %>%
mutate(median_month_date = format(ymd("1960-01-01") + days(floor(days_since_january)), "%m-%d"))
# A tibble: 2 x 4
# Groups: lat, decade [2]
lat decade days_since_january median_month_date
<dbl> <chr> <dbl> <chr>
1 55 1950-1959 38 02-08
2 55 1960-1969 49 02-19
# A tibble: 2 x 4
# Groups: lat, decade [2]
lat decade days_since_january median_month_date
<int> <chr> <dbl> <chr>
1 57 1950-1959 72 03-13
2 57 1960-1969 1.5 01-02