Search code examples
javacluster-analysisdata-miningk-meansmining

KMeans algorithm for mining human activity pattern from smart home data


I am working on data mining project to mine human activity patterns from smart meter data. I am unable to find a solution for KMeans or how to use KMeans algorithm for clustering.

The data is like this, a day is divided into 48 slots each of 30 minute and active appliance at that slot.

_Click here to see snapshot of Dataset_

Now I want to create the clusters like time of day (Morning, Afternoon, Evening, Night), weekday, week and/or month of the year, season. What approach should I follow to get the result using KMeans?


Solution

  • KMeans cannot be used for this in a meaningful way in an obvious way.

    The algorithm is designed for continuous variables, where it computes the mean (hence the name), and squared deviations from the mean are to be minimized. But your data is not continuous valued. It does not make sense to use the mean appliance ID, nor the squared deviation.