I have read many of the threads and do not think my question has been asked before. I have a data.frame
in R related to advertisements shown to customers as such:.. I have many customers, 8 different products.. so this is just a sample
mydf <- data.frame(Cust = c(1, 1), age = c(24, 24),
state = c("NJ", "NJ"), Product = c(1, 1), cost = c(400, 410),
Time = c(35, 25), Purchased = c("N", "Y"))
mydf
# Cust age state Product cost Time Purchased
# 1 1 24 NJ 1 400 35 N
# 2 1 24 NJ 1 410 23 Y
And I want to transform it to look as such ...
Cust | age | state | Product | cost.1 | time.1 | purch.1 | cost.2 | time.2 | purch.2
1 | 24 | NJ | 1 | 400 | 35 | N | 410 | 23 | Y
How can I do this? There are a few static variables for each customer such as age, state and a few others... and then there are the details associated with each offer that was presented to a given customer, the product # in the offer, the cost, the time, and if they purchased it... I want to get all of this onto 1 line for each customer to perform analysis.
It is worth noting that the number of products maxes out at 7, but for some customers it ranges from 1 to 7.
I have no sample code to really show. I have tried using the aggregate
function, but I do not want to aggregate, or do any SUMs. I just want to do some joins. Research suggests the cbind
, and tapply
functions may be useful.
Thank you for your help. I am very new to R.
You are essentially asking to do a "long" to "wide" reshape of your data.
It looks to me like you're using "Cust", "age", "state", and "Product" as your ID variables. You don't have a an actual "time" variable though ("time" as in the sequential count of records by the IDs mentioned above). However, such a variable is easy to create:
mydf$timevar <- with(mydf,
ave(rep(1, nrow(mydf)),
Cust, age, state, Product, FUN = seq_along))
mydf
# Cust age state Product cost Time Purchased timevar
# 1 1 24 NJ 1 400 35 N 1
# 2 1 24 NJ 1 410 23 Y 2
From there, this is pretty straightforward with the reshape
function in base R.
reshape(mydf, direction = "wide",
idvar=c("Cust", "age", "state", "Product"),
timevar = "timevar")
# Cust age state Product cost.1 Time.1 Purchased.1 cost.2 Time.2 Purchased.2
# 1 1 24 NJ 1 400 35 N 410 23 Y