I have a dataframe that has dimensions of (325,928 x 2).
Below is a very small subset of that data:
Destination = c('A60001', 'A60001','A60001','A60001','A60001','A60001','A60001','A60001',
'A60001','A60001','A60001','A60001','A60001','A60001','A60001','A60001',
'A60001','A60001','A60001','A60001','A60001','A60001','A60001','A60001',
'A60001', 'A60002', 'A60002','A60002','A60002','A60003')
Source = c('AA53', 'AA582', 'AA18', 'AA388', 'AA841', 'AA72', 'AA19', 'AA77', 'AA78', 'AA20', 'AA21',
'AA12', 'AA412', 'AA634', 'AA591', 'AA859', 'AA157', 'AA254', 'AA167', 'AA176',
'AA428', 'AA538', 'AA268', 'AA196', 'AA1250', 'AA23', 'AA16', 'AA692', 'AA196',
'AA22')
df = data.frame(Destination, Source)
> df
Destination Source
1 A60001 AA53
2 A60001 AA582
3 A60001 AA18
4 A60001 AA388
5 A60001 AA841
6 A60001 AA72
7 A60001 AA19
8 A60001 AA77
9 A60001 AA78
10 A60001 AA20
11 A60001 AA21
12 A60001 AA12
13 A60001 AA412
14 A60001 AA634
15 A60001 AA591
16 A60001 AA859
17 A60001 AA157
18 A60001 AA254
19 A60001 AA167
20 A60001 AA176
21 A60001 AA428
22 A60001 AA538
23 A60001 AA268
24 A60001 AA196
25 A60001 AA1250
26 A60002 AA23
27 A60002 AA16
28 A60002 AA692
29 A60002 AA196
30 A60003 AA22
Ultimate goal here is to transform this dataframe into a new dataframe using something similar to dcast because dcast cannot handle large amounts of data.
So here was the original code that I tried with this dataframe:
test<-dcast(cbind(df,V1 = rep(1,nrow(df))),`Source` ~ Destination,value.var='V1',fun.aggregate = length)
Output:
Source A60001 A60002 A60003
1 AA12 1 0 0
2 AA1250 1 0 0
3 AA157 1 0 0
4 AA16 0 1 0
5 AA167 1 0 0
6 AA176 1 0 0
7 AA18 1 0 0
8 AA19 1 0 0
9 AA196 1 1 0
10 AA20 1 0 0
11 AA21 1 0 0
12 AA22 0 0 1
13 AA23 0 1 0
14 AA254 1 0 0
15 AA268 1 0 0
16 AA388 1 0 0
17 AA412 1 0 0
18 AA428 1 0 0
19 AA53 1 0 0
20 AA538 1 0 0
21 AA582 1 0 0
22 AA591 1 0 0
23 AA634 1 0 0
24 AA692 0 1 0
25 AA72 1 0 0
26 AA77 1 0 0
27 AA78 1 0 0
28 AA841 1 0 0
29 AA859 1 0 0
It works with the dataset I am providing but when I test it out with the full dataset of dimensions: 325,928 x 2
, R crashes. Is there a better function that can produce the same output but handle larger amounts of data. If this isn't enough information, I can provide the full dataset privately to whoever thinks they can solve this ( i can't provide it here because StackOverflow can't read all the data) so you can test out the issue directly from the source.
Any help would be great, thanks!
Thanks to @Imo suggestion, this is the new solution to solving this:
If your dataset is very large/wide, convert your dataframe to a data.table and then from there
library(data.table)
df1<-setDT(df)
new3$value<-1
trial<-dcast(new3, Source ~ Destination, fill = 0)
This will give you the same result and can handle large amounts of data