I have a large dataset in CSV:
After loading the CSV into RStudio and applying unclass()
, I apply as(...,"transactions")
.
The result is something like this:
# transactions in sparse format with
# 5 transactions (rows) and
# 1455 items (columns)
Instead of 50,000 transactions, there are only 5 now.
Where have all the transactions gone? Was the matrix somehow transposed (as the row count in the result equals the column count of my CSV)?
This may be a data pre-processing problem, but according to this post my input data should have the right format.
[I'm posting for the first time here and am fairly new to R/RStudio.]
Have a look at the coercion
methods in the man page ? transactions
. You will see that you either need a binary incidence matrix, a list of transactions, or a data.frame containing only categorical variables. Your data is not one of these to as(..., "transactions")
will fail.
I think read.transactions
can read you data.
library(arules)
# create and write some data
data <- paste(
"item1,item2,,,",
"item1,,,,",
"item2,item3,,,",
sep="\n")
write(data, file = "demo_basket")
# read the data
tr <- read.transactions("demo_basket", format = "basket", sep=",")
inspect(tr)
items
[1] {item1,item2}
[2] {item1}
[3] {item2,item3}