I have a classification problem to figure out hotel cancellations (in python).
I'm stuck in a problem of the first steps.
I have some variables regarding hotel reservations, and some of them are:
The 'ArrivalDateYear' is composed by only 3 years, so i assume i had to handle this variable as a 'categorical' or 'non_metric' feature.
Now for the other two variables i'm stuck, its 31 days for one and xx weeks for another. Should I deal with them has 'numerical' features? Should I just ignore them? Or should i handle them as 'categorical' feature?
For the programming part, should i put: data['ArrivalDateYear'] = data['ArrivalDateYear'].astype('category') (...)?
Is there any other way to handle 'month','days' etc variables in a simpe Machine Learning Supervised Problem?
I would make these variables categorical. There are two approaches that you can try to implement depending on your case:
convert them into separate binary variables, where each category represents a unique week number or day of the month (it helps if there are non-linear relationships between dates and target)
extract from this new feature: for example, you can derive features like "IsWeekend" or "IsHoliday". It will be more helpful (IMO)