Search code examples
rdataframereplaceinequalities

Replace inequalities in a dataframe with different types of elements in R


I have a dataframe that involves several columns, in which there are many instances where inequalities are present. What I would like to have, is an R script that will identify these inequalities and replace them with actual values. More specific, let's assume that we have "<2" and we want to replace it with its half value ("<2" -> 1.0). Is there a generic way to do it so that I do not need to find manually all the inequalities within the dataframe and replace them?

A simple example might be the following:

Col1,Col2, Col3, Col4 
3.4, RHO_1, <5, NA 
2,   RHO_2,  5, 1.3

And I want to get something like this:

Col1,Col2,Col3,Col4 
3.4, RHO_1, 2.5, NA 
2,   RHO_2,  5, 1.3

When all elements are numeric values (e.g. use numeric values instead of RHO_1, RHO_2 and NA), the following command is working:

df <-  lapply(df, function(x) sapply(sub("<", "0.5*", x, fixed = TRUE),
                                function(y) eval(parse(text = y))))

However, the above command does not work in the presence of NA and strings (e.g. RHO_1). I have tried to find the location of the value-only elements after converting all non-values into NA using the following command:

value_ind<- which(!is.na(as.matrix(df)), arr.ind = TRUE, useNames = TRUE) 

but I did not manage to use this information successfully. For your information the actual dataframe df consists of many rows and columns.


Solution

  • I have managed to fix the issue. I have obtained a subset of the original dataframe (here named dataBase2) so that it does not include characters (e.g. exclude RHO_1,). The reduced dataframe is named dataBase6. Then, I have converted other symbols (e.g. "-","_" etc) to NA, and then applied the function. Below I am giving the code from the actual dataset:

    # names of the columns that I want to remove (contain character)
    out <- c("Code-Medsal","Number","Code_National","Projection","date","Notes") 
    dataBase6 <- dataBase2[, !(colnames(dataBase2) %in% out) ] 
    #replace special symbols with NA
    dataBase6[dataBase6=="-"] <- NA
    #apply the function to the numeric values + NA
    dataBase6[] <-  lapply(dataBase6, function(x) sapply(sub("<", "0.55*", x, fixed = TRUE),
                                      function(y) eval(parse(text = y))))