Search code examples
rdataframe

Identify the Columns with Value and Fill the Blanks with 0 only for a defined range in R


The data frame I have is given below:

structure(list(Family = c("POP", "POP", "POP", "POP", "POP", 
"POP", "POP", "POP", "POP", "POP", "QOP", "QOP", "QOP", "QOP", 
"QOP", "QOP", "QOP", "QOP", "QOP", "QOP"), `Sub Family` = c("ABC", 
"ABC", "ABC", "ABC", "ABC", "XYZ", "XYZ", "XYZ", "XYZ", "XYZ", 
"PRQ", "PRQ", "PRQ", "PRQ", "PRQ", "STV", "STV", "STV", "STV", 
"STV"), Size = c("1H", "2H", "3H", "4H", "5H", "1H", "2H", "3H", 
"4H", "5H", "1H", "2H", "3H", "4H", "5H", "1H", "2H", "3H", "4H", 
"5H"), Num1 = c(NA, NA, NA, NA, NA, 0.25, 0.25, 0.25, 0.25, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), Num2 = c(0.5, 0.2, 0.3, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, NA, NA, NA, 
NA)), row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame"
))

The expectation is that, we check all the Sub Category and then go through each size from 1H to 5H. In case we cant find any value in the Num1 for the Sub Family value we keep it as blank.

But if we find a value for each Sub Category, we keep the value as is and then populate the rest of the sizes as 0.

For example, Sub Family XYZ has value in size 1H, 2H, 3H, 4H and no value on 5H. So we keep all the values as for 1H, 2H, 3H, 4H and then put 0 on 5H instead of blank.

Another case: Sub Family ABC has value in size 1H, 2H, 3H, 4H, 5H as no value and it should remain as blank.

For visual purpose, The input looks like this.

enter image description here

The expected output should look like this.

enter image description here


Solution

  • You can try

    library(dplyr)
    df %>%
      mutate(
        across(
          starts_with("Num"),
          ~ if (!all(is.na(.x))) replace_na(.x, 0) else .x
        ),
        .by = c(Family, `Sub Family`)
      )
    

    where

    • starts_with select the columns whose name starts with the given pattern
    • .by = divide the dataframe rows into groups which are characterized by Family and Sub Family
    • In each group, if there is no non-NA values, then keep them as they are, otherwise replace_na replaces NAs with 0s.

    such that it gives

    # A tibble: 20 × 5
       Family `Sub Family` Size   Num1  Num2
       <chr>  <chr>        <chr> <dbl> <dbl>
     1 POP    ABC          1H    NA      0.5
     2 POP    ABC          2H    NA      0.2
     3 POP    ABC          3H    NA      0.3
     4 POP    ABC          4H    NA      0
     5 POP    ABC          5H    NA      0
     6 POP    XYZ          1H     0.25  NA
     7 POP    XYZ          2H     0.25  NA
     8 POP    XYZ          3H     0.25  NA
     9 POP    XYZ          4H     0.25  NA
    10 POP    XYZ          5H     0     NA
    11 QOP    PRQ          1H    NA     NA  
    12 QOP    PRQ          2H    NA     NA
    13 QOP    PRQ          3H    NA     NA
    14 QOP    PRQ          4H    NA     NA
    15 QOP    PRQ          5H    NA     NA
    16 QOP    STV          1H    NA      1
    17 QOP    STV          2H    NA      0
    18 QOP    STV          3H    NA      0
    19 QOP    STV          4H    NA      0
    20 QOP    STV          5H    NA      0