I need to calculate the thickness of the hypoxic layer (hypoxia == ODO_mgL < 2.0) by CRN. I need the depth (in meters) of how thick hypoxia is for DO < 2.0 mgL (create new column for this thickness of the hypoxic layer) by CRN. For example, if I have a CRN (site) with DO > 2 from depth 0m to depth 9m and DO drops below 2 from depth 9m to depth 10m the thickness would be 1m. If there is no hypoxia for a particular site then the hypoxia thickness should be 0m. I need to create a new column that calculates the difference between the depth where hypoxia (DO<2) starts and the maximum depth for each unique CRN.
I have spent over 4 hours trying to do this (still learning r) and feel like I am getting close, but have not figured it out. I have looked in SO and other resources, but maybe I am not wording this correctly.
Example df:
structure(list(DATE = c("8/16/2021", "8/16/2021", "8/16/2021",
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021",
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021",
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021",
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021",
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021",
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021",
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021",
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021",
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021",
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021",
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021",
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021",
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021",
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021",
"8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021", "8/16/2021",
"8/16/2021", "8/16/2021"), TIME = c("9:00:37 AM", "9:00:38 AM",
"9:00:39 AM", "9:00:40 AM", "9:00:41 AM", "9:00:42 AM", "9:00:43 AM",
"9:00:44 AM", "9:00:45 AM", "9:00:46 AM", "9:00:47 AM", "9:00:48 AM",
"9:00:49 AM", "9:00:50 AM", "9:00:51 AM", "9:00:52 AM", "9:00:53 AM",
"9:00:54 AM", "9:00:55 AM", "9:00:56 AM", "9:00:57 AM", "9:00:58 AM",
"9:00:59 AM", "9:01:00 AM", "9:01:01 AM", "9:01:02 AM", "9:01:03 AM",
"9:01:04 AM", "9:01:05 AM", "9:01:06 AM", "9:01:07 AM", "9:01:08 AM",
"9:01:09 AM", "9:01:10 AM", "9:01:11 AM", "9:01:12 AM", "9:01:13 AM",
"9:01:14 AM", "9:01:15 AM", "9:01:16 AM", "9:01:17 AM", "9:01:18 AM",
"9:01:19 AM", "9:01:20 AM", "9:01:21 AM", "9:01:22 AM", "9:01:23 AM",
"9:01:24 AM", "9:01:25 AM", "9:01:26 AM", "9:01:27 AM", "9:01:28 AM",
"9:01:29 AM", "9:01:30 AM", "9:01:31 AM", "9:01:32 AM", "9:01:33 AM",
"9:01:34 AM", "9:01:35 AM", "9:01:36 AM", "9:01:37 AM", "9:01:38 AM",
"9:01:39 AM", "9:01:40 AM", "9:01:41 AM", "9:01:42 AM", "9:01:43 AM",
"9:01:44 AM", "9:01:45 AM", "9:01:46 AM", "9:01:47 AM", "9:01:48 AM",
"9:01:49 AM", "9:01:50 AM", "9:01:51 AM", "9:01:52 AM", "9:01:53 AM",
"9:01:54 AM", "9:01:55 AM", "9:01:56 AM"), CRN = c(801, 801,
801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801,
801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801,
801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801,
801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801,
801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801,
801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801, 801
), ODO_mgL = c(8.84, 8.8, 8.76, 8.72, 8.69, 8.65, 8.63, 8.59,
8.57, 8.54, 8.49, 8.44, 8.39, 8.35, 8.31, 8.28, 8.25, 8.23, 8.21,
8.19, 8.17, 8.15, 8.14, 8.12, 8.11, 8.1, 8.09, 7.65, 7.35, 7.1,
7.11, 7.08, 7.01, 6.56, 6.41, 6.28, 6.22, 6.08, 5.66, 5.53, 5.38,
5.16, 5.19, 5.1, 5.02, 4.59, 4.43, 4.39, 4.46, 4.44, 4.37, 4.31,
4.25, 3.8, 3.71, 3.6, 3.57, 3.51, 3.47, 3.43, 3.39, 3.34, 2.95,
2.81, 2.66, 2.59, 2.51, 2.44, 2.38, 2.32, 2.27, 1.91, 1.85, 1.78,
1.76, 1.72, 1.7, 1.67, 1.66, 1.63), Depth_m = c(0.039, 0.043,
0.052, 0.757, 0.678, 0.764, 0.764, 0.833, 0.837, 0.838, 0.857,
0.893, 2.01, 2.155, 2.368, 2.368, 2.368, 2.406, 4.205, 4.299,
4.265, 4.265, 4.252, 4.252, 4.253, 4.259, 4.27, 5.291, 5.498,
5.387, 5.479, 5.479, 5.486, 5.513, 5.562, 5.628, 5.668, 5.722,
5.772, 5.82, 5.855, 5.917, 5.958, 6, 6.036, 6.06, 6.102, 7.063,
7.059, 7.035, 6.982, 6.984, 6.997, 7.032, 7.729, 7.581, 7.629,
7.629, 7.649, 7.68, 7.756, 7.844, 7.972, 8.041, 8.1, 8.159, 8.225,
8.316, 9.063, 9.218, 9.183, 9.137, 9.159, 9.188, 9.315, 9.52,
9.625, 9.698, 9.754, 9.816), Vertical_Position_m = c(0.073, 0.223,
0.662, 0.766, 0.684, 0.725, 0.892, 0.926, 0.781, 0.784, 1.013,
1.467, 1.848, 2.035, 2.273, 2.359, 2.42, 3.132, 3.827, 4.226,
4.254, 4.18, 4.227, 4.283, 4.272, 4.29, 4.65, 5.081, 5.352, 5.396,
5.452, 5.488, 5.568, 5.742, 5.737, 5.761, 5.956, 6.049, 6.161,
6.163, 6.142, 6.421, 6.426, 6.47, 6.468, 6.37, 6.81, 6.995, 7.052,
7.033, 6.981, 7.002, 7.12, 7.518, 7.685, 7.637, 7.602, 7.674,
7.702, 7.926, 8.146, 8.14, 8.34, 8.341, 8.475, 8.724, 8.741,
8.952, 9.005, 9.166, 9.168, 9.135, 9.353, 9.556, 9.736, 9.874,
9.917, 9.902, 10.15, 10.221), Temp_C = c(23, 23.0555555555556,
23.1111111111111, 23.1666666666667, 23.2222222222222, 23.2777777777778,
23.2777777777778, 23.3888888888889, 23.4444444444444, 23.4444444444444,
23.5, 23.5555555555556, 23.6666666666667, 23.6666666666667, 23.7777777777778,
23.7777777777778, 23.8333333333333, 23.8333333333333, 23.8333333333333,
23.8333333333333, 23.8333333333333, 23.8333333333333, 23.8333333333333,
23.8333333333333, 23.8333333333333, 23.8333333333333, 23.8333333333333,
23.8333333333333, 23.8333333333333, 23.8333333333333, 22.5, 22.3888888888889,
22.3888888888889, 22.2777777777778, 22.2222222222222, 22.0555555555556,
21.9444444444444, 21.8333333333333, 21.5555555555556, 21.3888888888889,
21.3333333333333, 21.1111111111111, 19.9444444444444, 19.8888888888889,
19.7777777777778, 19.7777777777778, 19.7777777777778, 17.8333333333333,
17.4444444444444, 17.3333333333333, 17.3333333333333, 17.3333333333333,
17.3333333333333, 17.3333333333333, 17.3333333333333, 17.2777777777778,
17.2222222222222, 17.1666666666667, 17.1111111111111, 17.0555555555556,
17, 17, 16.8888888888889, 16.8888888888889, 16.8333333333333,
16.7777777777778, 16.7222222222222, 16.6666666666667, 16.6666666666667,
16.6111111111111, 16.5555555555556, 16.5, 16.4444444444444, 16.3333333333333,
16.2777777777778, 16.2222222222222, 16.1666666666667, 16.1111111111111,
16, 15.9444444444444)), row.names = c(NA, 80L), class = "data.frame")
I have tried the following:
dfso %>%
group_by(CRN) %>%
summarise(thick = Depth_m[])
this just gives me the same values in the Depth_m column, so not helpful
dfso %>%
group_by(CRN, DO2 = ifelse(ODO_mgL < 2, "below", "above")) %>%
mutate(HypoxThick = Depth_m-lag(Depth_m))
this does that lag/stepwise (not sure how you call it) difference, which is not what I need
dfso %>%
group_by(CRN, DO2thick = ifelse(ODO_mgL < 2, "below", "above")) %>%
mutate(HypThick = max(Depth_m))
this just gives me the maximum depth by CRN
dfso %>%
group_by(CRN) %>%
mutate(thick = case_when(ODO_mgL<2 ~ max(Depth_m)-Depth_m))
this is not helpful
dfso %>%
group_by(CRN, DO2 = ifelse(ODO_mgL < 2, "below", "above")) %>%
mutate(thick = max(Depth_m) - min(Depth_m))
This last one is extremely close to what I need. In this case the hypoxia thickness is 0.679m , but how do I get this value to also be applied to that ODO_mgL cutoff? That whole "thick" column should have only the 0.679 value.
I looked at the following posts: this, this, this, this, this, and many other SO posts, and blog posts elsewhere. Thank you for your time and help!
This solution assumes that your data is sorted by depth (i.e. that the depth where hypoxia starts is the maximum depth for that hypoxic layer). It goes a bit further than your last attempt in that it has thickness of 0 if the row is not in a hypoxic layer, and it should also work if you have multiple hypoxic layers in the data.
My approach was:
hypoxic_layer
which begins at 0 and increments at the start of every hypoxic layer.CRN
and hypoxic_layer
and calculate the thickness
as the difference between min()
and max()
of Depth_m
within each group.library(dplyr)
want <- dfso %>%
mutate(hypoxic_layer=cumsum(if_else(CRN==lag(CRN) & ODO_mgL<2 & lag(ODO_mgL)>=2,1,0)), # this column increments at the beginning of each new hypoxic layer
hypoxic_layer=if_else(ODO_mgL>=2,0,hypoxic_layer)) %>% # set hypoxic layer = 0 if ODOmgL>=2
group_by(CRN,hypoxic_layer) %>% # group by each hypoxic layer in each CRN
mutate(thickness=if_else(hypoxic_layer==0,0,max(Depth_m)-min(Depth_m))) # difference between max and min depth in each group is the thickness
As per clarification that the thickness should be applied to all rows in each CRN, I have updated the code. I use summarise
instead of mutate to calculate the thickness with just 1 row per hypoxic layer, then use right_join
to merge the thickness
column with the original data.
want <- dfso %>%
mutate(hypoxic_layer=cumsum(if_else(CRN==lag(CRN) & ODO_mgL<2 & lag(ODO_mgL)>=2,1,0)), # this column increments at the beginning of each new hypoxic layer
hypoxic_layer=if_else(ODO_mgL>=2,0,hypoxic_layer)) %>% # set hypoxic layer = 0 if ODOmgL>=2
group_by(CRN,hypoxic_layer) %>% # group by each hypoxic layer in each CRN
summarise(thickness=max(Depth_m)-min(Depth_m)) %>% # difference between max and min depth in each group is the thickness
filter(hypoxic_layer != 0) %>% # remove rows which are not hypoxic layers
group_by(CRN) %>% #
summarise(thickness=max(thickness)) %>% # in case of multiple hypoxic layers in any CRN we take the maximum thickness
right_join(dfso,by='CRN') # join this to our original data
I included the group_by
and summarise
lines before the final right_join
to catch any possible instance where there might be more than 1 hypoxic layer in an individual CRN. If this ever occurs, then the maximum thickness will be used.