Find rows in common between columns belonging to specific groups using R

I have a data frame of 8 columns and 6 rows. All rows are numeric and contains positive and negative values.

I have another file with 8 columns and associated group name (A, B, C or D) for each column. The first 2 columns are group A, the following 2 columns are group B, the next 2 columns are group C, and the last 2 columns are group D.

I want a code in R that allows me to calculate for each column, how many of the rows with values above 0 (X > 0) in a column have also values above 0 (x > 0) in the other columns. And the output should be grouped based on the four groups. Something like this:

Data file:

    S1  S2  S3  S4  S5  S6  S7  S8
R1  2   1   2  -1  -3   5   4  -3 
R2  4  -6   1   2   1   2   1   5
R3  3   2  -3  -9  -5  -1   4   9
R4  4  -4  -4  -6   4  -7   6   6
R5  6  -5   2   2  -7  -6   7  -6
R6  4   4  -3   3  -2   3  -4   2

Group file:

S1        S2       S3       S4       S5      S6      S7     S8
GroupA   GroupA  GroupB   GroupB   GroupC  GroupC  GroupD  GroupD

Expected output file

           S1   S2   S3  S4  S5  S6  S7  S8
Group A    6    3    3   3   2   3   5   4
Group B    4    2    3   3   1   3   3   2
Group C    4    1    2   2   2   3   3   3
Group D    6    3    3   3   2   3   5   5

Explanation for the values obtained in the expected output file:

Example 1: S1 and GroupB

The value obtained is 4, this is because S1 has values greater than 0 in all 6 rows, while R1, R2, R5 and R6 are greater than 0 in at least one of the samples of group B (S3 and S4).

Example 2: S3 and GroupD

The value obtained is 3, this is because S3 has values greater than 0 in R1, R2 and R5, and the rows are also greater than 0 in at least one of the samples of group D (S7 and S8).

Solution

I hope I understood correctly what you want to achieve.

df <- read.table(
  text = "    S1  S2  S3  S4  S5  S6  S7  S8
R1  2   1   2  -1  -3   5   4  -3 
R2  4  -6   1   2   1   2   1   5
R3  3   2  -3  -9  -5  -1   4   9
R4  4  -4  -4  -6   4  -7   6   6
R5  6  -5   2   2  -7  -6   7  -6
R6  4   4  -3   3  -2   3  -4   2"
)

groupings_new <- matrix(c(rep("Group A", 2), rep("Group B", 2), rep("Group C", 2), rep("Group D", 2)), byrow = TRUE, nrow = 1)
colnames(groupings_new) <- paste0("S", 1:8)

# get TRUE/FALSE if a value is greater than 0
info_df <- df > 0

res <- lapply(seq_len(ncol(info_df)), function(i) {
  grouping_info <- groupings_new[, , drop = TRUE]
  
  # check if a value in the matrix and in the column of interest are both greater
  # than 0
  compare_df <- info_df & matrix(rep(info_df[, i], ncol(info_df)), nrow = nrow(info_df))
  
  # split by groups
  res <- lapply(unique(grouping_info), function(one_group) {
    group_index <- grouping_info == one_group
    # check which rows are of interest (values greater than 0) and how many are
    # there
    sum(rowSums(compare_df[, group_index, drop = FALSE]) > 0)
  })
  res_clean <- data.frame(unlist(res))
  colnames(res_clean) <- colnames(info_df[, i, drop = FALSE])
  rownames(res_clean) <- unique(grouping_info)
  res_clean
})

do.call(cbind, res)
#>         S1 S2 S3 S4 S5 S6 S7 S8
#> Group A  6  3  3  3  2  3  5  4
#> Group B  4  2  3  3  1  3  3  2
#> Group C  4  2  2  2  2  3  3  3
#> Group D  6  3  3  3  2  3  5  4

^{Created on 2023-02-27 by the reprex package (v1.0.0)}

My output is not completely the same as your expected output because I think there are a few mistakes.