Search code examples
rrowcorpus

How to count number of rows in each file of a corpus in R


Simple question... given for example:

data("crude")

which is a corpus with 20 text documents, how do I get something like:

1  4
2  6
3  5
4  3
etc...

where the second column is the number of rows of each document in the corpus "crude"? Or even a vector of row numbers would work.

NROW/nrow don't seem to work.

Thanks for looking!


Solution

  • Hi you can count line feed (LF) with

    library(stringr)
    str_count(string = crude[[1]], pattern = "\\n")
    # [1] 11
    

    crude[[1]] have 12 rows on my computer, so for all the corpus you can do this :

    sapply(crude, FUN = function(x) str_count(string = x, pattern = "\\n") + 1)