Search code examples
rsummary

r data pattern generation


I have a dataset like this.

ID   Col1    Col2    Col3    Col4    Score
1    1       1       1       1       30.14
2    1       1       1       1       30.26
3    1       1       1       1       30.14
4    1       1       1       1       30.14
5    1       3       3       3       38.78
6    1       3       3       3       38.78
7    1       1       1       2       38.34
8    1       1       1       2       38.34

I like to create a summary of data patterns like this.

  Col1    Col2    Col3    Col4    Score
  1       1       1       1       30.14, 30.26
  1       3       3       3       38.78
  1       1       1       2       38.34

I am not sure how to generate such a pattern. I tried the md.pattern function from mice but it does not give a collapsed column of Scores. Doing a unique also will not work. Any suggestion on how to create this summary is much apricated.


Solution

  • library(dplyr)
    data.frame(
             ID = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L),
            Col1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
            Col2 = c(1L, 1L, 1L, 1L, 3L, 3L, 1L, 1L),
            Col3 = c(1L, 1L, 1L, 1L, 3L, 3L, 1L, 1L),
            Col4 = c(1L, 1L, 1L, 1L, 3L, 3L, 2L, 2L),
           Score = c(30.14, 30.26, 30.14, 30.14, 38.78, 38.78, 38.34, 38.34)
    ) %>%
      distinct(Col1, Col2, Col3, Col4, Score) %>%
      group_by(Col1, Col2, Col3, Col4) %>%
      summarize(Score = paste(Score, collapse = ", "), .groups = "drop")
    

    result (ordered by the Col values instead of orig order of appearance)

    # A tibble: 3 × 5
       Col1  Col2  Col3  Col4 Score       
      <int> <int> <int> <int> <chr>       
    1     1     1     1     1 30.14, 30.26
    2     1     1     1     2 38.34       
    3     1     3     3     3 38.78