machine-learning classification categorical-data supervised-learning

Numerical and Categorical Features in classification problem

I have a classification problem to figure out hotel cancellations (in python).

I'm stuck in a problem of the first steps.

I have some variables regarding hotel reservations, and some of them are:

ArrivalDateYear: Year of the arrival date
ArrivalDateWeekNumber: Week number of the arrival date
ArrivalDateDayOfMonth: Day of the month of the arrival date

The 'ArrivalDateYear' is composed by only 3 years, so i assume i had to handle this variable as a 'categorical' or 'non_metric' feature.

Now for the other two variables i'm stuck, its 31 days for one and xx weeks for another. Should I deal with them has 'numerical' features? Should I just ignore them? Or should i handle them as 'categorical' feature?

For the programming part, should i put: data['ArrivalDateYear'] = data['ArrivalDateYear'].astype('category') (...)?

Is there any other way to handle 'month','days' etc variables in a simpe Machine Learning Supervised Problem?

Solution

I would make these variables categorical. There are two approaches that you can try to implement depending on your case:

convert them into separate binary variables, where each category represents a unique week number or day of the month (it helps if there are non-linear relationships between dates and target)
extract from this new feature: for example, you can derive features like "IsWeekend" or "IsHoliday". It will be more helpful (IMO)