short questions: I want to move the data in one column one row up by group: within the group, the first row is replaced by second row data, second replace by third etc. The last row of the new column is 0. I have 40,000 grouping levels and 230,000 row in total.
Long questions: I have a panel data regarding consumer purchase, multiple users purchase at multiple times. The group variable is the consumer with brand choice data. I want to replicate a second choice data to calculate the repurchase rate for a certain brand of the same user. To define the repurchase : the repurchase of the "1st purchase" is the "2nd purchase"; And the repurchase of the "2nd purchase" is the "3rd purchase" The last purchase has no repurchase.
My code takes me 8 minutes for this simple data manipulations in r, but only 1 second in Excel.
Data is like(sorted by day shopper first and day):
Day Shopper Choice
1 A Coke
2 A Coke
1 B Sprite
1 C Coke
2 C Pepsi
3 C Coke
1 D Sprite
2 D Sprite
The desired output:
Day Shopper Choice choice 2
1 A apple *apple*
2 A apple 0
1 B Banana 0
1 C apple Banana
2 C Banana apple
3 C apple 0
1 D berry *berry*
2 D berry 0
My original code is
# sort the data by user first and then by day
# choice.2 is the new column name
n<-nrow(dt)
for (i in 1:n) {
if (df$shopper[i]==dt$shopper[i+1])
{choice.2[i]<-choice[i+1]}
else {choice.2[i]<-0}}
If you mean that you want to group by shopper and then lead Choice
, filling with zeros, with dplyr,
library(dplyr)
df %>% group_by(Shopper) %>% mutate(choice2 = lead(as.character(Choice), default = '0'))
## Source: local data frame [8 x 4]
## Groups: Shopper [4]
##
## Day Shopper Choice choice2
## <int> <fctr> <fctr> <chr>
## 1 1 A apple apple
## 2 2 A apple 0
## 3 1 B Banana 0
## 4 1 C apple Banana
## 5 2 C Banana apple
## 6 3 C apple 0
## 7 1 D berry berry
## 8 2 D berry 0
or with data.table, something like
library(data.table)
setDT(df)[, choice2 := shift(as.character(Choice), type = 'lead', fill = '0') , by = Shopper][]
## Day Shopper Choice choice2
## 1: 1 A apple apple
## 2: 2 A apple 0
## 3: 1 B Banana 0
## 4: 1 C apple Banana
## 5: 2 C Banana apple
## 6: 3 C apple 0
## 7: 1 D berry berry
## 8: 2 D berry 0
or in base,
df$choice2 <- ave(as.character(df$Choice), df$Shopper, FUN = function(x){c(x[-1], '0')})
df
## Day Shopper Choice choice2
## 1 1 A apple apple
## 2 2 A apple 0
## 3 1 B Banana 0
## 4 1 C apple Banana
## 5 2 C Banana apple
## 6 3 C apple 0
## 7 1 D berry berry
## 8 2 D berry 0
If Choice
is a factor, all versions coerce choice2
to character instead, which comes with some time penalties. If you add "0"
to the factor levels, the same approaches should work without coercion.