How to calculate the column product for specific columns matching a pattern and meet some conditions, row-by-row?

Suppose I have the following data.frame:

dt = tibble::tibble(
  id_0 = rep(123, 6),
  name_0 = rep("A", 6),
  id_1 = c(rep(321, 3), rep(322, 3)),
  name_1 = c(rep("B", 3), rep("C", 3)),
  p_1 = c(rep(0.7, 3), rep(0.3, 3)),
  id_2 = c(NA, 323:326, NA),
  name_2 = c(NA, "D", "E", "J", "G", NA),
  p_2 = c(NA, 0.8, 0.2, 0.9, 0.1, NA),
  id_3 = c(NA, NA, 323, NA, NA, NA),
  na_3 = c(NA, NA, "H", NA, NA, NA),
  p_3 = c(NA, NA, 1, NA, NA, NA),
)

Which looks like this:

# A tibble: 6 x 11
   id_0 name_0  id_1 name_1   p_1  id_2 name_2   p_2  id_3 na_3 
  <dbl> <chr>  <dbl> <chr>  <dbl> <int> <chr>  <dbl> <dbl> <chr>
1   123 A        321 B        0.7    NA NA      NA      NA NA   
2   123 A        321 B        0.7   323 D        0.8    NA NA   
3   123 A        321 B        0.7   324 E        0.2   323 H    
4   123 A        322 C        0.3   325 J        0.9    NA NA   
5   123 A        322 C        0.3   326 G        0.1    NA NA   
6   123 A        322 C        0.3    NA NA      NA      NA NA

I need to take the row-by-row product from al the p_* columns. In this case it would be Product = p_1 * p_2 * p_3, but in general it could be any product from p_1 to p_* (this data.frame changes from case to case, I mean Product = product(p_1, p_2, ..., p_n)). Note that p_* is always greater than cero and less or equal than one (p_ > 0 & p_ <= 1). So the task I need to accomplish have two things: The Product1) must omit the NAs and 2) be general for any number of p_* present in the data.frame.

The desireble output should look like this:

# A tibble: 6 x 12
   id_0 name_0  id_1 name_1   p_1  id_2 name_2   p_2  id_3 na_3    p_3 Product
  <dbl> <chr>  <dbl> <chr>  <dbl> <int> <chr>  <dbl> <dbl> <chr> <dbl>   <dbl>
1   123 A        321 B        0.7    NA NA      NA      NA NA       NA    0.7 
2   123 A        321 B        0.7   323 D        0.8    NA NA       NA    0.56
3   123 A        321 B        0.7   324 E        0.2   323 H         1    0.14
4   123 A        322 C        0.3   325 J        0.9    NA NA       NA    0.27
5   123 A        322 C        0.3   326 G        0.1    NA NA       NA    0.03
6   123 A        322 C        0.3    NA NA      NA      NA NA       NA    0.3

Solution

I suggest a reshaping operation:

library(dplyr)
library(tidyr) # pivot_longer

# preserve a row-wise "id"
dt <- mutate(dt, rn = row_number())

dt %>%
  pivot_longer(-rn, names_pattern = c("(.*)_([0-9])"), names_to = c(".value", "num"))
# # A tibble: 24 x 6
#       rn num      id name      p na   
#    <int> <chr> <dbl> <chr> <dbl> <chr>
#  1     1 0       123 A      NA   <NA> 
#  2     1 1       321 B       0.7 <NA> 
#  3     1 2        NA <NA>   NA   <NA> 
#  4     1 3        NA <NA>   NA   <NA> 
#  5     2 0       123 A      NA   <NA> 
#  6     2 1       321 B       0.7 <NA> 
#  7     2 2       323 D       0.8 <NA> 
#  8     2 3        NA <NA>   NA   <NA> 
#  9     3 0       123 A      NA   <NA> 
# 10     3 1       321 B       0.7 <NA> 
# # ... with 14 more rows

With this, we can easily group_by, calculate the product, ...

dt %>%
  pivot_longer(-rn, names_pattern = c("(.*)_([0-9])"), names_to = c(".value", "num")) %>%
  group_by(rn) %>%
  summarize(Product = prod(p, na.rm = TRUE))
# # A tibble: 6 x 2
#      rn Product
#   <int>   <dbl>
# 1     1   0.7  
# 2     2   0.560
# 3     3   0.140
# 4     4   0.27 
# 5     5   0.03 
# 6     6   0.3

... and then join it back in to dt.

dt %>%
  pivot_longer(-rn, names_pattern = c("(.*)_([0-9])"), names_to = c(".value", "num")) %>%
  group_by(rn) %>%
  summarize(Product = prod(p, na.rm = TRUE)) %>%
  left_join(dt, ., by = "rn") %>%
  select(-rn)
# # A tibble: 6 x 12
#    id_0 name_0  id_1 name_1   p_1  id_2 name_2   p_2  id_3 na_3    p_3 Product
#   <dbl> <chr>  <dbl> <chr>  <dbl> <int> <chr>  <dbl> <dbl> <chr> <dbl>   <dbl>
# 1   123 A        321 B        0.7    NA <NA>    NA      NA <NA>     NA   0.7  
# 2   123 A        321 B        0.7   323 D        0.8    NA <NA>     NA   0.560
# 3   123 A        321 B        0.7   324 E        0.2   323 H         1   0.140
# 4   123 A        322 C        0.3   325 F        0.9    NA <NA>     NA   0.27 
# 5   123 A        322 C        0.3   326 G        0.1    NA <NA>     NA   0.03 
# 6   123 A        322 C        0.3    NA <NA>    NA      NA <NA>     NA   0.3

(Side note: based on your comment of "any number of p_*", it might make sense to keep your data in the longer format (out of pivot_longer) for more of your processing.)