library(pdfsearch)
Characters <- c("Ben", "John")
keyword_search('location of file',
keyword = Characters,
path = TRUE)
keyword page_num
1 Ben 1
2 Ben 1
3 John 1
4 John 2
How can i make R count all my keywords on every page_num, creating a dataframe like:
name page count
1 Ben 1 2
2 John 1 1
3 John 2 1
I know nrow function but is there a faster way?
nrow(dataframe[dataframe$keyword == "Ben" & dataframe$page_num == 1, ])
Base R supports a wide variety of ways to perform grouped operations (probably too many, as it makes choosing the appropriate method harder):
my_data <- data.frame(name = c("Ben", "Ben", "John", "John"), page_num = c(1,1,1,2))
> test
name page_num
1 Ben 1
2 Ben 1
3 John 1
4 John 2
# table()
> table(my_data)
page_num
name 1 2
Ben 2 0
John 1 1
> as.data.frame(table(my_data))
name page_num Freq
1 Ben 1 2
2 John 1 1
3 Ben 2 0
4 John 2 1
# xtabs
> xtabs(~ name + page_num, data = test)
page_num
name 1 2
Ben 2 0
John 1 1
> as.data.frame(xtabs(~ name + page_num, data = my_data))
name page_num Freq
1 Ben 1 2
2 John 1 1
3 Ben 2 0
4 John 2 1
Other functions for performing grouped operations include by()
, tapply()
, ave()
and more.
The popular dplyr
package also has a syntax for performing grouped operations on data.frame
objects without transformation:
library(dplyr)
# `group_by()`, `mutate()`, `%>%`, and `n()` are exports from `dplyr`
my_data %>%
group_by(name, page_number) %>%
mutate(count = n())
# n() is a dplyr operator that is mechanically identical to length()