I am trying to create a large empty data.frame and insert a groups of row. I have seen a few similar questions on numerous forums, however I have been unable to apply any of them successfully to the specific formatting issue I am having.
I started with rbind(df,allic) # allic is the data frame I would like to insert into df # however, given the size of my dataset the operation takes 5 1/2 minutes to complete. I understand that creating the data frame at the beginning and replacing rows improves efficiency, however I have been unable to make it work for my problem. Code is as follows:
Initial data:
Order.ID Product
1 193505 Onion Rings
2 193505 Pineapple Cheddar Burger
3 193623 Fountain Soda
4 193623 French Fries
5 193623 Hamburger
6 193623 Hot Dog
7 193631 French Fries
8 193631 Hamburger
9 193631 Milkshake
The products won't match to below, however this being a formatting issue I figured it best to show the formatting that brought me to where I am now.
nb$Order.ID <- as.factor(nb$Order.ID)
plist <- aggregate(nb$Product,list(nb$Order.ID),list)
allp <- unique(unlist(plist$x))
allic <- expand.grid(plist$x[[1]], Var2=plist$x[[1]], Var3=1)
Var1 Var2 Var3
1 Onion Rings Onion Rings 1
2 Pineapple Cheddar Burger Onion Rings 1
3 Onion Rings Pineapple Cheddar Burger 1
4 Pineapple Cheddar Burger Pineapple Cheddar Burger 1
Now I create an empty dataframe (df) using:
df <- data.frame(factor=rep(NA, rcnt), factor=rep(NA,rcnt), stringsAsFactors=FALSE)
rcnt being a large, arbitrary number which I plan to trim once the operation is complete. My issue comes when I try to insert these lines using:
df[1:4,] <- allic
head(df, n=10)
factor factor.1
1 47 47
2 51 47
3 47 51
4 51 51
5 NA NA
6 NA NA
7 NA NA
8 NA NA
How can I insert rows in a dataframe without losing the format of my values? I would greatly appreciate any help I can get at this point.
EDIT Per comment below:
>df[i] <- for(i in 1:nrow(plist)) {
> allic <- expand.grid(plist$x[[i]], Var2=plist$x[[i]], Var3=1)
> df[i:nrow(allic),] <- sapply(allic, as.character)
I'm still very new with R, however this was working when I was using df <- rbind(df,allic). nrow(df) is 4096.
Try wrapping allic in as.character
as follows:
df[1:4,] <- sapply(allic, as.character)
> df
factor factor.1
1 Onion Rings Onion Rings
2 Pineapple Cheddar Burger Onion Rings
3 Onion Rings Pineapple Cheddar Burger
4 Pineapple Cheddar Burger Pineapple Cheddar Burger
5 <NA> <NA>
6 <NA> <NA>
7 <NA> <NA>
8 <NA> <NA>
9 <NA> <NA>
10 <NA> <NA>