Search code examples
rnamissing-datadplyracross

Replace NA values with preexisting value based on multiple conditions


I am working with the following data. It pertains to the dimensions of items and the boxes that they are shipped in.

       Box_Height     Box_Length     Box_Width Item_Height Item_Length Item_Width
 1             NA             74             4          NA          NA         NA
 2             NA             42            NA           6          42          6
 3              6             NA            NA           6          22          6
 4              6             NA            NA           6          42          6
 5              6             NA            NA           6          42          6
 6             NA             NA            NA          NA          NA         NA

According to the shipment company, when one of the box columns has NA values, it means that the item already comes in a box and is being shipped as is. So, I'd just need to replace the missing Box_Height, for instance, with the Item_Height.

I wrote the following code to do this:

df$Box_Height[is.na(df$Box_Height) & !is.na(df$Item_Height)] <- df$Item_Height

I'm ultimately trying to test when a row is missing a box dimension and the specific item dimension is not missing, then replace the missing box dimension with the item dimension.

I am receiving this error:

Error in df$Box_Height[is.na(df$Box_Height) &  : 
  NAs are not allowed in subscripted assignments

This is somewhat confusing since that's what I'm trying to replace.

If anyone has any advice as how to properly do this or where I'm going wrong I'd greatly appreciate the help.


Solution

  • Try using ifelse() to apply the same condition.

    df$Box_Height <- ifelse(is.na(df$Box_Height) & !is.na(df$Item_Height), df$Item_Height, df$Box_Height)
    

    The ifelse() function requires you to provide values for cases where the condition is true and false separately which ensures that the vector lengths will match. Subsetting df$Box_Height with [ probably results in a vector that is shorter than df$Item_Height, which is unsubsetted.