Search code examples
rstructuresummary

summary statistics for a variable based on another variable


Im trying to find that how many x values in ID where some values are repeated then based on the new result find the min, max, IQR, and median in overall ;

ID <- c("ID004", "ID004", "ID004", "ID004", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID009", "ID009", "ID009", "ID009", "ID009", "ID009", "ID020", "ID020")
D <- c("CMP-001", "CMP-001","CMP-001","CMP-001","CMP-001", "CMP-001","CMP-002", "CMP-002", "CMP-002", "CMP-003", "CMP-003", "CMP-003", "CMP-004", "CMP-004", "CMP-004", "CMP-001", "CMP-001", "CMP-001", "CMP-001", "CMP-002", "CMP-002", "CMP-001", "CMP-001")
X <- c(3,3,3,3,1,1,3,3,3,1,1,1,4,4,4,4,4,4,4,2,2,2,2)
data <- data.frame(ID, D, X)

we first find how many x values per ID as;

ID.       No. of X values
ID004.          1
ID006.          4
ID009           2
ID020           1

then based on this result we should get the following result;

                          Min.    Median.    Max.     IQR
Number of X per ID        1         1.5        4      3-1

I think we need to create a new variable which include values of X per ID. then find the summery statistics for the new variable

Thank you for your help


Solution

  • Hope this answers:

    > data %>% group_by(ID) %>% summarise(Min = min(X), Median = median(X), Max = max(X), IQR = IQR(X), No_of_X_values = length(rle(X)[[1]]))
    `summarise()` ungrouping output (override with `.groups` argument)
    # A tibble: 4 x 6
      ID      Min Median   Max   IQR No_of_X_values
      <chr> <dbl>  <dbl> <dbl> <dbl>          <int>
    1 ID004     3      3     3   0                1
    2 ID006     1      3     4   2.5              4
    3 ID009     2      4     4   1.5              2
    4 ID020     2      2     2   0                1
    > 
    

    Can store the ID and No of x values in a new dataframe and take summary statistics for No. of x values:

    > x_values <- data %>% group_by(ID) %>% summarise(No_of_X_values = length(rle(X)[[1]]))
    `summarise()` ungrouping output (override with `.groups` argument)
    > x_values
    # A tibble: 4 x 2
      ID    No_of_X_values
      <chr>          <int>
    1 ID004              1
    2 ID006              4
    3 ID009              2
    4 ID020              1
    > summary(x_values$No_of_X_values)
       Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
        1.0     1.0     1.5     2.0     2.5     4.0