The data frame is like this: enter image description here
header: system
Row 1: 00000000000000000503_0
Row 2: 00000000000000000503_1
Row 3: 00000000000000000503_2
Row 4: 00000000000000000503_3
Row 5: 000000000000000004e7_0
Row 6: 000000000000000004e7_1
Row 7: 00000000000000000681_0
Row 8: 00000000000000000681_1
Row 9: 00000000000000000681_2
I want to generate a frequency table with the quantities of the code before string "_" such that:
"00000000000000000503" appears 4 times, "000000000000000004e7" appears 2 times, and so on.
How do I do this in R?
Remove everything after underscore and use table
to count frequency
table(sub("_.*", "", data$col1))
#Also
#table(sub("(.*)_.*", "\\1", data$col1))
#000000000000000004e7 00000000000000000503 00000000000000000681
# 2 4 3
If final output needs to be a dataframe use stack
stack(table(sub("_.*", "", data$col1)))
# values ind
#1 2 000000000000000004e7
#2 4 00000000000000000503
#3 3 00000000000000000681
data
data <- structure(list(col1 = structure(c(3L, 4L, 5L, 6L, 1L, 2L, 7L,
8L, 9L), .Label = c("000000000000000004e7_0", "000000000000000004e7_1",
"00000000000000000503_0", "00000000000000000503_1",
"00000000000000000503_2",
"00000000000000000503_3", "00000000000000000681_0",
"00000000000000000681_1",
"00000000000000000681_2"), class = "factor")), class = "data.frame",
row.names = c(NA, -9L))