I have a dataframe that involves several columns, in which there are many instances where inequalities are present. What I would like to have, is an R script that will identify these inequalities and replace them with actual values. More specific, let's assume that we have "<2"
and we want to replace it with its half value ("<2" -> 1.0)
. Is there a generic way to do it so that I do not need to find manually all the inequalities within the dataframe and replace them?
A simple example might be the following:
Col1,Col2, Col3, Col4
3.4, RHO_1, <5, NA
2, RHO_2, 5, 1.3
And I want to get something like this:
Col1,Col2,Col3,Col4
3.4, RHO_1, 2.5, NA
2, RHO_2, 5, 1.3
When all elements are numeric values (e.g. use numeric values instead of RHO_1, RHO_2 and NA), the following command is working:
df <- lapply(df, function(x) sapply(sub("<", "0.5*", x, fixed = TRUE),
function(y) eval(parse(text = y))))
However, the above command does not work in the presence of NA
and strings (e.g. RHO_1
).
I have tried to find the location of the value-only elements after converting all non-values into NA using the following command:
value_ind<- which(!is.na(as.matrix(df)), arr.ind = TRUE, useNames = TRUE)
but I did not manage to use this information successfully.
For your information the actual dataframe df
consists of many rows and columns.
I have managed to fix the issue. I have obtained a subset of the original dataframe (here named dataBase2) so that it does not include characters (e.g. exclude RHO_1,). The reduced dataframe is named dataBase6. Then, I have converted other symbols (e.g. "-","_" etc) to NA, and then applied the function. Below I am giving the code from the actual dataset:
# names of the columns that I want to remove (contain character)
out <- c("Code-Medsal","Number","Code_National","Projection","date","Notes")
dataBase6 <- dataBase2[, !(colnames(dataBase2) %in% out) ]
#replace special symbols with NA
dataBase6[dataBase6=="-"] <- NA
#apply the function to the numeric values + NA
dataBase6[] <- lapply(dataBase6, function(x) sapply(sub("<", "0.55*", x, fixed = TRUE),
function(y) eval(parse(text = y))))