In R, I want to classify each rows of the data frame by binning the values and using the number (sum) of values in each bin to assign them into 2 groups (classes) by using if-else logic.
First, I created a data frame x:
n1 <- c(1, 7); n2 <- c(2, 11); n3 <- c(10, 14); n4 <- c(23, 32); n5 <- c(37, 37); n6 <- c(45, 41)
x <- data.frame(n1, n2, n3, n4, n5, n6)
x
n1 n2 n3 n4 n5 n6
1 1 2 10 23 37 45
2 7 11 14 32 37 41
The 1st row should be classified as "P", because it has 1 pair of values (1, 2) falling in the same bin 1..10.
The 2nd row should be classified as "PP", because it has 2 pairs of values (11, 14 and 32, 37) falling in 2 bins: 10..19 and 30..39, accordingly.
So, after creating the data frame x, I created a for-loop:
for(i in nrow(x)){
# binning the data:
bins <- split(as.numeric(x[i, ]), cut(as.numeric(x[i, ]), c(0, 9, 19, 29, 39, 49)))
# creating the rule for p (1 pair of numbers falling in the same range)
p <- (sum(lengths(bins) == 2) == 1 & sum(lengths(bins) == 1) == 4)
# creating the rule for pp (2 different pairs, each has 2 numbers falling in the same range)
pp <- (sum(lengths(bins) == 2) == 2 & sum(lengths(bins) == 1) == 2 & sum(lengths(bins) == 0) == 1)
if(p){
x$types <- "P"
} else if(pp){
x$types <- "PP"
} else{
stop("error")
}
}
print(x)
I want to create a new column named types, holding the class P or PP:
n1 n2 n3 n4 n5 n6 types
1 1 2 10 23 37 45 P
2 7 11 14 32 37 41 PP
Instead the code returned only PP:
n1 n2 n3 n4 n5 n6 types
1 1 2 10 23 37 45 PP
2 7 11 14 32 37 41 PP
This is because the loop runs twice over the rows. But if it runs only once, all the rows are classified as "P", instead of "PP". I expect it's something very simple, just was not able to figure it out so far.
The error in your for
loop is that you don't use i
when you assign type
. x$types <- "P"
assigns the entire types
column to be "P"
. x$types <- "PP"
assigns the whole types
column to be "PP"
. So, whatever the last result is, that will be the value for your entire column.
Also, using the full row x[i, ]
is dangerous after you add the types
column. Presumably you don't want to try to convert the "P" and "PP" values of types
to numeric and bin them. I would suggest making types
a separate vector, and only adding it as a column after the loop. Before the loop: types <- chracter(nrow(x))
. Inside the loop: types[i] <-
instead of x$types <-
. After the loop, x$types <- types
.
You are also making the classic syntax error of for (i in nrow(x))
when you mean for (i in 1:nrow(x))
.
Fixing all of these:
n1 <- c(1, 7); n2 <- c(2, 11); n3 <- c(10, 14); n4 <- c(23, 32); n5 <- c(37, 37); n6 <- c(45, 41)
x <- data.frame(n1, n2, n3, n4, n5, n6)
types <- character(nrow(x))
for(i in 1:nrow(x)){
# binning the data:
bins <- split(as.numeric(x[i, ]), cut(as.numeric(x[i, ]), c(0, 9, 19, 29, 39, 49)))
# creating the rule for p (1 pair of numbers falling in the same range)
p <- (sum(lengths(bins) == 2) == 1 & sum(lengths(bins) == 1) == 4)
# creating the rule for pp (2 different pairs, each has 2 numbers falling in the same range)
pp <- (sum(lengths(bins) == 2) == 2 & sum(lengths(bins) == 1) == 2 & sum(lengths(bins) == 0) == 1)
if(p){
types[i] <- "P"
} else if(pp){
types[i] <- "PP"
} else{
stop("error")
}
}
x$types <- types
x
# n1 n2 n3 n4 n5 n6 types
# 1 1 2 10 23 37 45 P
# 2 7 11 14 32 37 41 PP