Search code examples
rstringdplyrreplace

Replacing values in a dataframe, based on column names containing some characters


I have this dataset containing a number of columns starting with similar names

`CropType<-maize, beans,peas
NumberSM<-200,300,NA
PercentageSM<-90%,50%,NA
NumberLM<-600,NA,234
PercentageLM<-20%,NA,78%
df1<-data.frame(CropType,NumberSM,PercentageSM,NumberLM,PercentageLM)`

I would wish to request a single line of code that replaces any missing value in the column that starts with Number, with 0.0, and replaces any column missing any value in the column that starts with Percentage, with 0.0%.

My desired output is

`CropType<-maize, beans,peas
NumberSM<-200,300,0.0
PercentageSM<-90%,50%,0.0%
NumberLM<-600,0.0,234
PercentageLM<-20%,0.0%,78%
df1<-data.frame(CropType,NumberSM,PercentageSM,NumberLM,PercentageLM)`

I have tried this

`df1 %>%
  mutate(NumberLM=replace(NumberLM,is.na(NumberLM),0.0),
         PercentageLM=replace(PercentageLM,is.na(PercentageLM),'0.0%'))`

But I have several columns that need replacements, and I am avoiding referencing each column to replace


Solution

  • Here is a way with mutate/across. See code comments for explanations.

    CropType <- c("maize", "beans", "peas")
    NumberSM <- c(200,300,NA)
    PercentageSM <- c("90%","50%",NA)
    NumberLM <- c(600,NA,234)
    PercentageLM <- c("20%",NA,"78%")
    df1 <- data.frame(CropType,NumberSM,PercentageSM,NumberLM,PercentageLM)
    
    library(dplyr)
    #> 
    #> Attaching package: 'dplyr'
    #> The following objects are masked from 'package:stats':
    #> 
    #>     filter, lag
    #> The following objects are masked from 'package:base':
    #> 
    #>     intersect, setdiff, setequal, union
    
    # here mutate all columns where is.numeric returns TRUE
    df1 %>%
      mutate(across(where(is.numeric), ~replace(.x, is.na(.x), 0.0)))
    #>   CropType NumberSM PercentageSM NumberLM PercentageLM
    #> 1    maize      200          90%      600          20%
    #> 2    beans      300          50%        0         <NA>
    #> 3     peas        0         <NA>      234          78%
    
    # to change the percentages, first determine which
    # columns have a '%' in them, then a mutate/across 
    # similar to the one above does the trick
    i_perc <- sapply(df1, \(x) any(grepl("%", x)))
    
    df1 %>%
      mutate(across(which(i_perc), ~ replace(.x, is.na(.x), "0.0%")))
    #>   CropType NumberSM PercentageSM NumberLM PercentageLM
    #> 1    maize      200          90%      600          20%
    #> 2    beans      300          50%       NA         0.0%
    #> 3     peas       NA         0.0%      234          78%
    

    Created on 2024-03-18 with reprex v2.1.0