Search code examples
rtidyrspread

Huge dataframe to spread


I have a huge df which dimension is (58556185 X 2)

user page  like
  1    A    1
  1    B    1
  1    C    1
  2    A    1
  2    C    1
  3    B    1
  .    .    .

and the unique user and unique pages are 100,000 and 50,000 respectively I want to spread it into

user/page
   A   B   C ...
1  1   1   0 ...
2  1   0   1 ...
3  0   1   0 ...
.
.

I have used this code and it works for small dataset

data <- data%>%
  group_by(user)%>%
  spread(page, like, fill = 0, drop = TRUE)

But when apply to huge df, it comes out Error: cannot allocate vector of size 21626.2 Gb

Any suggestions? Thanks


Solution

  • I have used sparse matrix to solve this problem.

    mat <- sparseMatrix(as.integer(factor(data.fbpage$uid)) ,as.integer(factor(data.fbpage$pageId)), x=data.fbpage$like)