Search code examples
rcategories

How to categorize numerical ranges in r


I have a data frame, where each column corresponds to patientID and each row corresponds to a particular gene value.

df <- data.frame(Hugo_Symbol=c("CDKN2A", "JUN", "IRS2","MTOR",
                           "NRAS"),
                  A183=c(-0.19,NA,2.01,0.4,1.23),
                  A185=c(0.11,2.45,NA,NA,1.67),
                  A186=c(1.19,NA,2.41,0.78,1.93),
                  A187=c(2.78,NA,NA,0.7,2.23),
                  A188=c(NA,NA,NA,2.4,1.23))
head(df)

  Hugo_Symbol  A183 A185 A186 A187 A188
1      CDKN2A -0.19 0.11 1.19 2.78   NA
2         JUN    NA 2.45   NA   NA   NA
3        IRS2  2.01   NA 2.41   NA   NA
4        MTOR  0.40   NA 0.78 0.70 2.40
5        NRAS  1.23 1.67 1.93 2.23 1.23

I would like to assign the following categories for each value:

  • if the value in the range (-Inf, -2) assign category "1"
  • if the value in the range (-2, 2) assign category "2"
  • if the value in the range (2,Inf) assign category "3"
  • if the value is NA assign category "0"

I tried to use a cut function to do that. My code looks like that:

df2<- df[cut(df,
             breaks=c(-Inf,-2,2,Inf),
             labels=c("1","2","3"))]

However, I received the following error:

Error in cut.default(df, breaks = c(-Inf, -2, 2, Inf), labels = c("1", : 'x' must be numeric

I believe it's because I have NA values in my table. I don't know how to assign the category "0" for NA values. The desired output should look like that:

Hugo_Symbol A183 A185 A186 A187 A188
1      CDKN2A    2    2    2    1    0
2         JUN    0    1    0    0    0
3        IRS2    1    0    1    0    0
4        MTOR    2    0    2    2    1
5        NRAS    2    2    2    1    2

How I can fix this error and replace each value with predefined category I have mentioned above?

Thank you for your help!

Olha


Solution

  • We can use findInterval in base R

    df[-1] <- lapply(df[-1], findInterval, c(-Inf, -2, 2, Inf))