I'm trying to use a custom estimator RNN for estimating whether a customer on my website is going to buy an item based on their click behavior. So the dataset looks like:
session_id page_type event since_previous_click (s) will_buy
1 search SelectCountry null 0
1 search SelectCountry 2 0
1 search SortResults 4 0
1 product SelectColor 20 0
2 search SelectCountry null 1
2 search SortResults 10 1
2 product SelectSize 5 1
2 product SelectColor 23 1
2 inmarket EnterName 8 1
2 inmarket Booked 34 1
So "will_buy" is the label, and page_type, event and since_previous_click are the input features. My problem is however that I do not know how to structure my input dataset. I know that the dimensions should be [#data points, #time steps, #features], where the number of time steps should be padded, since they are not of the same length. But I can't construct this 3D object from a tensor (or numpy array), since there are multiple dtypes (string and int32). Any help?
Convert page type and event to one-hot vector. Then all your data will be int32.