Nested for loops, different in R

d3:

Col1     Col2
PBR569   23
PBR565   22
PBR565   22
PBR565   22

I am using this loop:

for ( i in 1:(nrow (d3)-1) ){
    for (j in (i+1):nrow(d3)) {
      if(c(i) == c(j)) {
        print(c(j))
        # d4 <- subset.data.frame(c(j))
      }
    }
  }

I want to compare all the rows in Col1 and eliminate the ones that are not the same. Then I want to output a data frame with only the ones that have the same values in col1.

Expected Output:

    Col1     Col2
    PBR565   22
    PBR565   22
    PBR565   22

Not sure whats up with my nested loop? Is it because I don't specify the col names?

Solution

The OP has requested to compare all the rows in Col1 and eliminate the ones that are not the same.

If I understand correctly, the OP wants to remove all rows where the value in Col1 appears only once and to keep only those rows where the values appears two or more times.

This can be accomplished by finding duplicated values in Col1. The duplicated() function marks the second and subsequent appearences of a value as duplicated. Therefore, we need to scan forward and backward and combine both results:

d3[duplicated(d3$Col1) | duplicated(d3$Col1, fromLast = TRUE), ]

    Col1 Col2
2 PBR565   22
3 PBR565   22
4 PBR565   22

The same can be achieved by counting the appearances using the table() function as suggested by Ryan. Here, the counts are filtered to keep only those entries which appear two or more times.

t <- table(d3$Col1)
d3[d3$Col1 %in% names(t)[t >= 2], ]

Please, note that this is different from Ryan's solution which keeps only the rows whose value appears most often. Only one value is picked, even in case of ties. (For the given small sample dataset both approaches return the same result.)

Ryan's answer can be re-written in a slightly more concise way

d3[d3$Col1 == names(which.max(t)), ]

Data

d3 <- data.table::fread(
"Col1     Col2
PBR569   23
PBR565   22
PBR565   22
PBR565   22", data.table = FALSE)