I have data that contains twelve rows and more than 500 variables I want to keep only the variables that have value of line 9 > 5* value of line 10
Example of data:
Name ClassType Col1 Col2 Col3
---------------------------------------
A Class1 10 50 12
B Class2 7 20 12
C Class1 8 12 8
D Class1 9 14 17
E Class2 3 15 14
F Class2 10 15 16
G Class2 12 22 15
H Class1 10 28 10
I NA 50 10 30
J NA 8 5 2
Result I want: delete of column 2 because the value of line 9 in that column is < 5* value of line 10 of the same column:
Name ClassType Col1 Col3
-------------------------------
A Class1 10 12
B Class2 7 12
C Class1 8 8
D Class1 9 17
E Class2 3 14
F Class2 10 16
G Class2 12 15
H Class1 10 10
I NA 50 30
J NA 8 2
I tried if condition but it didn't give me good results, but I want to know if there's any other way.
The code i tried
data_4 <- as.data.frame(data_3[,1, drop=FALSE])
for (i in 2:640) {
a = as.numeric(data_3[9,i])
b = as.numeric(data_3[10,i])
print(b)
c = as.numeric(b*5)
if(a > c) {
data_4 <- cbind(data_4[, , drop=FALSE], data_3[ ,i,drop=FALSE])
}
Thank you
We may use select
to select the character
columns and the numeric
columns where the condition matches - 9th element of the column is greater than 5 times the last
value
library(dplyr)
df1 <- df1 %>%
dplyr::select(where(is.character),
where(~ is.numeric(.x) && nth(., 9) > 5 * last(.) ))
-output
df1
Name ClassType Col1 Col3
1 A Class1 10 12
2 B Class2 7 12
3 C Class1 8 8
4 D Class1 9 17
5 E Class2 3 14
6 F Class2 10 16
7 G Class2 12 15
8 H Class1 10 10
9 I <NA> 50 30
10 J <NA> 8 2
df1 <- structure(list(Name = c("A", "B", "C", "D", "E", "F", "G", "H",
"I", "J"), ClassType = c("Class1", "Class2", "Class1", "Class1",
"Class2", "Class2", "Class2", "Class1", NA, NA), Col1 = c(10L,
7L, 8L, 9L, 3L, 10L, 12L, 10L, 50L, 8L), Col2 = c(50L, 20L, 12L,
14L, 15L, 15L, 22L, 28L, 10L, 5L), Col3 = c(12L, 12L, 8L, 17L,
14L, 16L, 15L, 10L, 30L, 2L)), class = "data.frame", row.names = c(NA,
-10L))