I have a data set which is applied to discretization proceeding, and I want to coerce the data set to transactions for using arules package.
CLUST_K <- structure(list(LONGITUDE = c(118.5, 118.5, 118.5, 118.5, 118.5,
118.5), LATITUDE = c(-11.5, -11.5, -11.5, -11.5, -11.5, -11.5
), DATE_START = structure(c(1419897600, 1419984000, 1420070400,
1420156800, 1420243200, 1420329600), class = c("POSIXct", "POSIXt"
)), DATE_END = structure(c(1420502400, 1420588800, 1420675200,
1420761600, 1420848000, 1420934400), class = c("POSIXct", "POSIXt"
)), FLAG = c(2, 1, 2, 2, 2, 2), SURFSKINTEMP = c(13L, 1L, 16L,
16L, 7L, 13L), SURFAIRTEMP = c(6L, 6L, 6L, 6L, 6L, 6L), TOTH2OVAP = c(5L,
17L, 17L, 17L, 17L, 17L), TOTO3 = c(16L, 16L, 16L, 10L, 7L, 7L
), TOTCO = c(12L, 12L, 8L, 4L, 12L, 12L), TOTCH4 = c(13L, 14L,
6L, 6L, 11L, 7L), OLR_ARIS = c(10L, 4L, 4L, 7L, 5L, 10L), CLROLR_ARIS = c(10L,
4L, 4L, 7L, 5L, 10L), OLR_NOAA = c(10L, 10L, 10L, 10L, 7L, 9L
), MODIS_LST = c(1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("LONGITUDE",
"LATITUDE", "DATE_START", "DATE_END", "FLAG", "SURFSKINTEMP",
"SURFAIRTEMP", "TOTH2OVAP", "TOTO3", "TOTCO", "TOTCH4", "OLR_ARIS",
"CLROLR_ARIS", "OLR_NOAA", "MODIS_LST"), row.names = c(NA, 6L
), class = "data.frame")
from the data set CLUST_K, you can see that
LONGITUDE LATITUDE DATE_START DATE_END FLAG SURFSKINTEMP SURFAIRTEMP TOTH2OVAP TOTO3 TOTCO TOTCH4 OLR_ARIS CLROLR_ARIS OLR_NOAA MODIS_LST
1 118.5 -11.5 2014-12-30 2015-01-06 2 13 6 5 16 12 13 10 10 10 1
2 118.5 -11.5 2014-12-31 2015-01-07 1 1 6 17 16 12 14 4 4 10 1
3 118.5 -11.5 2015-01-01 2015-01-08 2 16 6 17 16 8 6 4 4 10 1
4 118.5 -11.5 2015-01-02 2015-01-09 2 16 6 17 10 4 6 7 7 10 1
5 118.5 -11.5 2015-01-03 2015-01-10 2 7 6 17 7 12 11 5 5 7 1
6 118.5 -11.5 2015-01-04 2015-01-11 2 13 6 17 7 12 7 10 10 9 1
first column to fifth column of the data set is the transaction information, and column 6 to column 15 are the transactions, and which are applied to discretization proceeding.
when I try to coerce the data set to transactions
CLUST_K_R <- CLUST_K[,6:15]
CLUST_K_R_T <- as(CLUST_K_R,"transactions")
Error in asMethod(object) :
column(s) 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 not logical or a factor. Discretize the columns first.
but I the data set has already applied to discretization proceeding
When I use split, it also seems not right
> s1 <- split(CLUST_K$SURFSKINTEMP, CLUST_K$SURFAIRTEMP,CLUST_K$TOTH2OVAP, CLUST_K$TOTO3)
> Tr <- as(s1,"transactions")
Warning message:
In asMethod(object) : removing duplicated items in transactions
> Tr
transactions in sparse format with
1 transactions (rows) and
4 items (columns)
only 1 transactions left, but it should be 6 transactions in my case.
Since you already discretized the data (via clustering), you only need to make sure that the data is encoded as nominal values (factor) not numbers (integer).
for(i in 1:ncol(CLUST_K_R)) CLUST_K_R[[i]] <- as.factor(CLUST_K_R[[i]])
CLUST_K_R_T <- as(CLUST_K_R,"transactions")
summary(CLUST_K_R_T)
transactions as itemMatrix in sparse format with
6 rows (elements/itemsets/transactions) and
30 columns (items) and a density of 0.3333333
most frequent items:
SURFAIRTEMP=6 MODIS_LST=1 TOTH2OVAP=17 TOTCO=12 OLR_NOAA=10 (Other)
6 6 5 4 4 35
element (itemset/transaction) length distribution:
sizes
10
6
Min. 1st Qu. Median Mean 3rd Qu. Max.
10 10 10 10 10 10
includes extended item information - examples:
labels variables levels
1 SURFSKINTEMP=1 SURFSKINTEMP 1
2 SURFSKINTEMP=7 SURFSKINTEMP 7
3 SURFSKINTEMP=13 SURFSKINTEMP 13
includes extended transaction information - examples:
transactionID
1 1
2 2
3 3