I am preparing my dataset to apply DBSCAN clustering. Before to do this I need to convert all my features to numbers in order to use StandardScaler(). My problem is that I am fighting with timestamp and datatime. I dropped out the day and timestamp columns and left only the Time column in seconds that appears to be integer. However I still get error like
X = StandardScaler().fit_transform(X)
TypeError: float() argument must be a string or a number, not 'Timestamp'
Thanks a lot in advance
duration float64
power float64
duration_2 float64
duration_2_energy float64
time2 int64
dtype: object
Don't standard scale everything. It's more often a bad idea than a good idea. Because eyou destroy information.
Instead, read the article on generalized DBSCAN by the DBSCAN authors. It shows how to use more complex data correctly.
Sander, Jörg; Ester, Martin; Kriegel, Hans-Peter; Xu, Xiaowei (1998).
Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications.
Data Mining and Knowledge Discovery. Berlin: Springer-Verlag. 2 (2): 169–194. doi:10.1023/A:1009745219419.
Here, you will probably want to use multiple epsilon thresholds. For example you want a threshold on time of a day, and an additional threshold on the numeric attributes.