Search code examples
rdataframefrequency

How to make a frequency table from a data frame in R


The data frame is like this: enter image description here

header: system
Row 1:  00000000000000000503_0
Row 2:  00000000000000000503_1
Row 3:  00000000000000000503_2
Row 4:  00000000000000000503_3
Row 5:  000000000000000004e7_0
Row 6:  000000000000000004e7_1
Row 7:  00000000000000000681_0
Row 8:  00000000000000000681_1
Row 9:  00000000000000000681_2

I want to generate a frequency table with the quantities of the code before string "_" such that:

"00000000000000000503" appears 4 times, "000000000000000004e7" appears 2 times, and so on.

How do I do this in R?


Solution

  • Remove everything after underscore and use table to count frequency

    table(sub("_.*", "", data$col1))
    #Also
    #table(sub("(.*)_.*", "\\1", data$col1))
    
    #000000000000000004e7 00000000000000000503 00000000000000000681 
    #                   2                    4                    3 
    

    If final output needs to be a dataframe use stack

    stack(table(sub("_.*", "", data$col1)))
    
    #  values                  ind
    #1      2 000000000000000004e7
    #2      4 00000000000000000503
    #3      3 00000000000000000681
    

    data

    data <- structure(list(col1 = structure(c(3L, 4L, 5L, 6L, 1L, 2L, 7L, 
    8L, 9L), .Label = c("000000000000000004e7_0", "000000000000000004e7_1", 
    "00000000000000000503_0", "00000000000000000503_1", 
    "00000000000000000503_2", 
    "00000000000000000503_3", "00000000000000000681_0", 
    "00000000000000000681_1", 
    "00000000000000000681_2"), class = "factor")), class = "data.frame", 
    row.names = c(NA, -9L))