Search code examples
rdplyrr-xlsx

Split xlsx file into small files based on count of rows


I have a dataset with greater than 20000 rows which I can't import into SharePoint due to its total number rows upload limitation to 20000. There are 17 columns and each of them have 6694 rows totaling to 113798.

So, I want to split this xlsx file into smaller ones with total rows less than 20000.

How can I do this?

Sample Data:

df2 <- data.frame(a = seq(1,6694), b = seq(1,6694), c = seq(1,6694),
                  d = seq(1,6694), e = seq(1,6694), f = seq(1,6694),
                  g = seq(1,6694), h = seq(1,6694), i = seq(1,6694),
                  k = seq(1,6694), l = seq(1,6694), m = seq(1,6694),
                  n = seq(1,6694), o = seq(1,6694), p = seq(1,6694),
                  q = seq(1,6694), replace = T) 

Solution

  • We could use gl to create a grouping index to split the big data into list of datasets with each having 20000 rows (if the whole data number of rows is not a multiple of 20000, the last list element will have the remaining number of rows)

    n <- 20000
    lst1 <- split(df2, as.integer(gl(nrow(df2), n, nrow(df2))))