Given a graphlab
SFrame
where there's a column with dates, e.g.:
+-------+------------+---------+-----------+
| Store | Date | Sales | Customers |
+-------+------------+---------+-----------+
| 1 | 2015-07-31 | 5263.0 | 555.0 |
| 2 | 2015-07-31 | 6064.0 | 625.0 |
| 3 | 2015-07-31 | 8314.0 | 821.0 |
| 4 | 2015-07-31 | 13995.0 | 1498.0 |
| 3 | 2015-07-20 | 4822.0 | 559.0 |
| 2 | 2015-07-10 | 5651.0 | 589.0 |
| 4 | 2015-07-11 | 15344.0 | 1414.0 |
| 5 | 2015-07-23 | 8492.0 | 833.0 |
| 2 | 2015-07-19 | 8565.0 | 687.0 |
| 10 | 2015-07-09 | 7185.0 | 681.0 |
+-------+------------+---------+-----------+
[986159 rows x 4 columns]
Is there an easy way in graphlab / other python function to convert the Date column to Year|Month|Day?
+-------+------+----+----+---------+-----------+
| Store | YYYY | MM | DD | Sales | Customers |
+-------+------+----+----+---------+-----------+
| 1 | 2015 | 07 | 31 | 5263.0 | 555.0 |
| 2 | 2015 | 07 | 31 | 6064.0 | 625.0 |
| 3 | 2015 | 07 | 31 | 8314.0 | 821.0 |
+-------+------------+---------+-----------+
[986159 rows x 4 columns]
In pandas
, I can do this: Which is the fastest way to extract day, month and year from a given date?
But to convert an SFrame into Panda to split date and convert back into SFrame is quite a chore.
You could also do it with the split-datetime method. It gives you a bit more flexibility.
sf.add_columns(sf['Date'].split_datetime(column_name_prefix = ''))
The split_datetime
method itself is on the SArray
(a single column of the SFrame) and it returns an SFrame which you can then add back to the original data (at basically 0 cost)