How to find difference between values based on condition by group in r and create new column with that value?

I need to calculate the thickness of the hypoxic layer (hypoxia == ODO_mgL < 2.0) by CRN. I need the depth (in meters) of how thick hypoxia is for DO < 2.0 mgL (create new column for this thickness of the hypoxic layer) by CRN. For example, if I have a CRN (site) with DO > 2 from depth 0m to depth 9m and DO drops below 2 from depth 9m to depth 10m the thickness would be 1m. If there is no hypoxia for a particular site then the hypoxia thickness should be 0m. I need to create a new column that calculates the difference between the depth where hypoxia (DO<2) starts and the maximum depth for each unique CRN.

I have spent over 4 hours trying to do this (still learning r) and feel like I am getting close, but have not figured it out. I have looked in SO and other resources, but maybe I am not wording this correctly.

Example df:

structure(list(DATE = c("8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", 
"8/16/2021", "8/16/2021"), TIME = c("9:00:37 AM", "9:00:38 AM", 
"9:00:39 AM", "9:00:40 AM", "9:00:41 AM", "9:00:42 AM", "9:00:43 AM", 
"9:00:44 AM", "9:00:45 AM", "9:00:46 AM", "9:00:47 AM", "9:00:48 AM", 
"9:00:49 AM", "9:00:50 AM", "9:00:51 AM", "9:00:52 AM", "9:00:53 AM", 
"9:00:54 AM", "9:00:55 AM", "9:00:56 AM", "9:00:57 AM", "9:00:58 AM", 
"9:00:59 AM", "9:01:00 AM", "9:01:01 AM", "9:01:02 AM", "9:01:03 AM", 
"9:01:04 AM", "9:01:05 AM", "9:01:06 AM", "9:01:07 AM", "9:01:08 AM", 
"9:01:09 AM", "9:01:10 AM", "9:01:11 AM", "9:01:12 AM", "9:01:13 AM", 
"9:01:14 AM", "9:01:15 AM", "9:01:16 AM", "9:01:17 AM", "9:01:18 AM", 
"9:01:19 AM", "9:01:20 AM", "9:01:21 AM", "9:01:22 AM", "9:01:23 AM", 
"9:01:24 AM", "9:01:25 AM", "9:01:26 AM", "9:01:27 AM", "9:01:28 AM", 
"9:01:29 AM", "9:01:30 AM", "9:01:31 AM", "9:01:32 AM", "9:01:33 AM", 
"9:01:34 AM", "9:01:35 AM", "9:01:36 AM", "9:01:37 AM", "9:01:38 AM", 
"9:01:39 AM", "9:01:40 AM", "9:01:41 AM", "9:01:42 AM", "9:01:43 AM", 
"9:01:44 AM", "9:01:45 AM", "9:01:46 AM", "9:01:47 AM", "9:01:48 AM", 
"9:01:49 AM", "9:01:50 AM", "9:01:51 AM", "9:01:52 AM", "9:01:53 AM", 
"9:01:54 AM", "9:01:55 AM", "9:01:56 AM"), CRN = c(801, 801, 
801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 
801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 
801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 
801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 
801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 
801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801
), ODO_mgL = c(8.84, 8.8, 8.76, 8.72, 8.69, 8.65, 8.63, 8.59, 
8.57, 8.54, 8.49, 8.44, 8.39, 8.35, 8.31, 8.28, 8.25, 8.23, 8.21, 
8.19, 8.17, 8.15, 8.14, 8.12, 8.11, 8.1, 8.09, 7.65, 7.35, 7.1, 
7.11, 7.08, 7.01, 6.56, 6.41, 6.28, 6.22, 6.08, 5.66, 5.53, 5.38, 
5.16, 5.19, 5.1, 5.02, 4.59, 4.43, 4.39, 4.46, 4.44, 4.37, 4.31, 
4.25, 3.8, 3.71, 3.6, 3.57, 3.51, 3.47, 3.43, 3.39, 3.34, 2.95, 
2.81, 2.66, 2.59, 2.51, 2.44, 2.38, 2.32, 2.27, 1.91, 1.85, 1.78, 
1.76, 1.72, 1.7, 1.67, 1.66, 1.63), Depth_m = c(0.039, 0.043, 
0.052, 0.757, 0.678, 0.764, 0.764, 0.833, 0.837, 0.838, 0.857, 
0.893, 2.01, 2.155, 2.368, 2.368, 2.368, 2.406, 4.205, 4.299, 
4.265, 4.265, 4.252, 4.252, 4.253, 4.259, 4.27, 5.291, 5.498, 
5.387, 5.479, 5.479, 5.486, 5.513, 5.562, 5.628, 5.668, 5.722, 
5.772, 5.82, 5.855, 5.917, 5.958, 6, 6.036, 6.06, 6.102, 7.063, 
7.059, 7.035, 6.982, 6.984, 6.997, 7.032, 7.729, 7.581, 7.629, 
7.629, 7.649, 7.68, 7.756, 7.844, 7.972, 8.041, 8.1, 8.159, 8.225, 
8.316, 9.063, 9.218, 9.183, 9.137, 9.159, 9.188, 9.315, 9.52, 
9.625, 9.698, 9.754, 9.816), Vertical_Position_m = c(0.073, 0.223, 
0.662, 0.766, 0.684, 0.725, 0.892, 0.926, 0.781, 0.784, 1.013, 
1.467, 1.848, 2.035, 2.273, 2.359, 2.42, 3.132, 3.827, 4.226, 
4.254, 4.18, 4.227, 4.283, 4.272, 4.29, 4.65, 5.081, 5.352, 5.396, 
5.452, 5.488, 5.568, 5.742, 5.737, 5.761, 5.956, 6.049, 6.161, 
6.163, 6.142, 6.421, 6.426, 6.47, 6.468, 6.37, 6.81, 6.995, 7.052, 
7.033, 6.981, 7.002, 7.12, 7.518, 7.685, 7.637, 7.602, 7.674, 
7.702, 7.926, 8.146, 8.14, 8.34, 8.341, 8.475, 8.724, 8.741, 
8.952, 9.005, 9.166, 9.168, 9.135, 9.353, 9.556, 9.736, 9.874, 
9.917, 9.902, 10.15, 10.221), Temp_C = c(23, 23.0555555555556, 
23.1111111111111, 23.1666666666667, 23.2222222222222, 23.2777777777778, 
23.2777777777778, 23.3888888888889, 23.4444444444444, 23.4444444444444, 
23.5, 23.5555555555556, 23.6666666666667, 23.6666666666667, 23.7777777777778, 
23.7777777777778, 23.8333333333333, 23.8333333333333, 23.8333333333333, 
23.8333333333333, 23.8333333333333, 23.8333333333333, 23.8333333333333, 
23.8333333333333, 23.8333333333333, 23.8333333333333, 23.8333333333333, 
23.8333333333333, 23.8333333333333, 23.8333333333333, 22.5, 22.3888888888889, 
22.3888888888889, 22.2777777777778, 22.2222222222222, 22.0555555555556, 
21.9444444444444, 21.8333333333333, 21.5555555555556, 21.3888888888889, 
21.3333333333333, 21.1111111111111, 19.9444444444444, 19.8888888888889, 
19.7777777777778, 19.7777777777778, 19.7777777777778, 17.8333333333333, 
17.4444444444444, 17.3333333333333, 17.3333333333333, 17.3333333333333, 
17.3333333333333, 17.3333333333333, 17.3333333333333, 17.2777777777778, 
17.2222222222222, 17.1666666666667, 17.1111111111111, 17.0555555555556, 
17, 17, 16.8888888888889, 16.8888888888889, 16.8333333333333, 
16.7777777777778, 16.7222222222222, 16.6666666666667, 16.6666666666667, 
16.6111111111111, 16.5555555555556, 16.5, 16.4444444444444, 16.3333333333333, 
16.2777777777778, 16.2222222222222, 16.1666666666667, 16.1111111111111, 
16, 15.9444444444444)), row.names = c(NA, 80L), class = "data.frame")

I have tried the following:

dfso %>% 
  group_by(CRN) %>%
  summarise(thick = Depth_m[])

this just gives me the same values in the Depth_m column, so not helpful

dfso %>% 
  group_by(CRN, DO2 = ifelse(ODO_mgL < 2, "below", "above")) %>% 
  mutate(HypoxThick = Depth_m-lag(Depth_m))

this does that lag/stepwise (not sure how you call it) difference, which is not what I need

dfso %>% 
  group_by(CRN, DO2thick = ifelse(ODO_mgL < 2, "below", "above")) %>% 
  mutate(HypThick = max(Depth_m))

this just gives me the maximum depth by CRN

dfso %>%
  group_by(CRN) %>%
  mutate(thick = case_when(ODO_mgL<2 ~ max(Depth_m)-Depth_m))

this is not helpful

dfso %>% 
  group_by(CRN, DO2 = ifelse(ODO_mgL < 2, "below", "above")) %>% 
  mutate(thick = max(Depth_m) - min(Depth_m))

This last one is extremely close to what I need. In this case the hypoxia thickness is 0.679m , but how do I get this value to also be applied to that ODO_mgL cutoff? That whole "thick" column should have only the 0.679 value.

I looked at the following posts: this, this, this, this, this, and many other SO posts, and blog posts elsewhere. Thank you for your time and help!

Solution

This solution assumes that your data is sorted by depth (i.e. that the depth where hypoxia starts is the maximum depth for that hypoxic layer). It goes a bit further than your last attempt in that it has thickness of 0 if the row is not in a hypoxic layer, and it should also work if you have multiple hypoxic layers in the data.

My approach was:

create a new variable hypoxic_layer which begins at 0 and increments at the start of every hypoxic layer.
then set this colum = 0 whenever the ODO_mgL is above 2, so 0 means no hypoxic layer, and all hypoxic layers get their own number
next group by CRN and hypoxic_layer and calculate the thickness as the difference between min() and max() of Depth_m within each group.

library(dplyr)

want <- dfso %>% 
    mutate(hypoxic_layer=cumsum(if_else(CRN==lag(CRN) & ODO_mgL<2 & lag(ODO_mgL)>=2,1,0)),       # this column increments at the beginning of each new hypoxic layer
           hypoxic_layer=if_else(ODO_mgL>=2,0,hypoxic_layer)) %>%                # set hypoxic layer = 0 if ODOmgL>=2
    group_by(CRN,hypoxic_layer) %>%                                              # group by each hypoxic layer in each CRN 
    mutate(thickness=if_else(hypoxic_layer==0,0,max(Depth_m)-min(Depth_m)))      # difference between max and min depth in each group is the thickness

Edit

As per clarification that the thickness should be applied to all rows in each CRN, I have updated the code. I use summarise instead of mutate to calculate the thickness with just 1 row per hypoxic layer, then use right_join to merge the thickness column with the original data.

want <- dfso %>% 
    mutate(hypoxic_layer=cumsum(if_else(CRN==lag(CRN) & ODO_mgL<2 & lag(ODO_mgL)>=2,1,0)),       # this column increments at the beginning of each new hypoxic layer
           hypoxic_layer=if_else(ODO_mgL>=2,0,hypoxic_layer)) %>%                # set hypoxic layer = 0 if ODOmgL>=2
    group_by(CRN,hypoxic_layer) %>%                                              # group by each hypoxic layer in each CRN 
    summarise(thickness=max(Depth_m)-min(Depth_m))  %>%                          # difference between max and min depth in each group is the thickness
    filter(hypoxic_layer != 0) %>%                                               # remove rows which are not hypoxic layers
    group_by(CRN) %>%                                                            #
    summarise(thickness=max(thickness)) %>%                                      # in case of multiple hypoxic layers in any CRN we take the maximum thickness
    right_join(dfso,by='CRN')                                                    # join this to our original data

I included the group_by and summarise lines before the final right_join to catch any possible instance where there might be more than 1 hypoxic layer in an individual CRN. If this ever occurs, then the maximum thickness will be used.