Search code examples
pythontensorflowrecurrent-neural-networksequential

How should I build my tensorflow dataset for feeding it into a custom RNN estimator


I'm trying to use a custom estimator RNN for estimating whether a customer on my website is going to buy an item based on their click behavior. So the dataset looks like:

session_id  page_type   event            since_previous_click (s)   will_buy
1           search      SelectCountry    null                        0
1           search      SelectCountry    2                           0
1           search      SortResults      4                           0
1           product     SelectColor      20                          0
2           search      SelectCountry    null                        1
2           search      SortResults      10                          1
2           product     SelectSize       5                           1
2           product     SelectColor      23                          1
2           inmarket    EnterName        8                           1
2           inmarket    Booked           34                          1

So "will_buy" is the label, and page_type, event and since_previous_click are the input features. My problem is however that I do not know how to structure my input dataset. I know that the dimensions should be [#data points, #time steps, #features], where the number of time steps should be padded, since they are not of the same length. But I can't construct this 3D object from a tensor (or numpy array), since there are multiple dtypes (string and int32). Any help?


Solution

  • Convert page type and event to one-hot vector. Then all your data will be int32.