Create an intervals' column in R using quantiles and compute an average after that

I have a dataset like that, representing contracts [invented data here due to confidentiality] :

field	subgroup	duration_days	amount_per_day
72145	7214598	700	8095800.2
72145	7214598	541	5039689.1
72145	7214554	98	3801372.8
72145	7214554	900	2444928.7
90385	9038529	100	2861857.0
90385	9038529	284	1736789.4

I'd like to obtain 2 other columns : one concerning intervals of days, based on 33th and 66th quantiles, and another that compute the average amount per day grouped by field and subgroup.

Desired output

field	subgroup	days_intervals	avg_amount_per_day
72145	7214598	0-33th quantile	Value1
72145	7214598	33th-66th quantile	Value2
72145	7214598	66th quantile - (duration of the longest contract)	Value3

And so on for all the field and subgroup, with of course Value 1,2,3 that are numbers as well as the values of quantiles.

When i use the following code :

f <- data.frame(df %>%  dplyr::group_by(field, subgroup) %>% dplyr::summarize(quants = quantile(duration_days, probs = c(0.33, 0.66))))

I obtain a dataframe with 3 columns : field, subgroup, quants, with 1 row per grouping and quantile value (so 2 rows everytime).

I'm wondering if R is a good tool to do that, but i can use Python if it's easier for this.

Solution

You may take help of cut -

library(dplyr)

df %>%  
  group_by(field, subgroup) %>%
  group_by(quants = cut(duration_days, quantile(duration_days, c(0, .33, .66, 1)), 
                        labels = c('33%', '66%', '100%'), include.lowest = TRUE), .add = TRUE) %>%
  summarise(avg_amount_per_day = mean(amount_per_day, na.rm = TRUE), .groups = 'drop')