Search code examples
rfunctionreplaceminzero

How can I replace zeros with half the minimum value within a column?


I am tying to replace 0's in my dataframe of thousands of rows and columns with half the minimum value greater than zero from that column. I would also not want to include the first four columns as they are indexes.

So if I start with something like this:

index <- c("100p", "200p", 300p" 400p")
ratio <- c(5, 4, 3, 2)
gene <- c("gapdh", NA, NA,"actb"
species <- c("mouse", NA, NA, "rat")
a1 <- c(0,3,5,2)
b1 <- c(0, 0, 4, 6)
c1 <- c(1, 2, 3, 4)

as.data.frame(q) <- cbind(index, ratio, gene, species, a1, b1, c1)

index ratio gene  species a1 b1 c1
100p    5   gapdh mouse   0  0  1
200p    4    NA    NA     3  0  2
300p    3    NA    NA     5  4  3
400p    2   actb  rat     2  6  4

I would hope to gain a result such as this:

index ratio gene  species a1 b1 c1
100p    5   gapdh mouse   1  2  1
200p    4    NA    NA     3  2  2
300p    3    NA    NA     5  4  3
400p    2   actb  rat     2  6  4

I have tried the following code: apply(q[-4], 2, function(x) "[<-"(x, x==0, min(x[x > 0]) / 2))

but I keep getting the error:Error in min(x[x > 0])/2 : non-numeric argument to binary operator

Any help on this? Thank you very much!


Solution

  • We can use lapply and replace the 0 values with minimum value in column by 2.

    cols<- 5:7
    q[cols] <- lapply(q[cols], function(x) replace(x, x == 0, min(x[x>0], na.rm = TRUE)/2))
    
    q
    #  index ratio  gene species a1 b1 c1
    #1  100p     5 gapdh   mouse  1  2  1
    #2  200p     4  <NA>    <NA>  3  2  2
    #3  300p     3  <NA>    <NA>  5  4  3
    #4  400p     2  actb     rat  2  6  4
    

    In dplyr, we can use mutate_at

    library(dplyr)
    q %>%  mutate_at(cols,~replace(., . == 0, min(.[.>0], na.rm = TRUE)/2))
    

    data

    q <- structure(list(index = structure(1:4, .Label = c("100p", "200p", 
    "300p", "400p"), class = "factor"), ratio = c(5, 4, 3, 2), gene = structure(c(2L, 
    NA, NA, 1L), .Label = c("actb", "gapdh"), class = "factor"), 
    species = structure(c(1L, NA, NA, 2L), .Label = c("mouse", 
    "rat"), class = "factor"), a1 = c(0, 3, 5, 2), b1 = c(0, 
    0, 4, 6), c1 = c(1, 2, 3, 4)), class = "data.frame", row.names = c(NA, -4L))