I have a data frame, where each column corresponds to patientID and each row corresponds to a particular gene value.
df <- data.frame(Hugo_Symbol=c("CDKN2A", "JUN", "IRS2","MTOR",
"NRAS"),
A183=c(-0.19,NA,2.01,0.4,1.23),
A185=c(0.11,2.45,NA,NA,1.67),
A186=c(1.19,NA,2.41,0.78,1.93),
A187=c(2.78,NA,NA,0.7,2.23),
A188=c(NA,NA,NA,2.4,1.23))
head(df)
Hugo_Symbol A183 A185 A186 A187 A188
1 CDKN2A -0.19 0.11 1.19 2.78 NA
2 JUN NA 2.45 NA NA NA
3 IRS2 2.01 NA 2.41 NA NA
4 MTOR 0.40 NA 0.78 0.70 2.40
5 NRAS 1.23 1.67 1.93 2.23 1.23
I would like to assign the following categories for each value:
I tried to use a cut
function to do that. My code looks like that:
df2<- df[cut(df,
breaks=c(-Inf,-2,2,Inf),
labels=c("1","2","3"))]
However, I received the following error:
Error in cut.default(df, breaks = c(-Inf, -2, 2, Inf), labels = c("1", : 'x' must be numeric
I believe it's because I have NA values in my table. I don't know how to assign the category "0" for NA values. The desired output should look like that:
Hugo_Symbol A183 A185 A186 A187 A188
1 CDKN2A 2 2 2 1 0
2 JUN 0 1 0 0 0
3 IRS2 1 0 1 0 0
4 MTOR 2 0 2 2 1
5 NRAS 2 2 2 1 2
How I can fix this error and replace each value with predefined category I have mentioned above?
Thank you for your help!
Olha
We can use findInterval
in base R
df[-1] <- lapply(df[-1], findInterval, c(-Inf, -2, 2, Inf))