Search code examples
pythontensorflowdeep-learningautoencoder

How to mix categorical, discrete and continuous data as input in tensorflow?


I am new to tensorflow. I have a dataset that has continuous, discrete and categorical values. The sample data is as follows:

     col1    col2    col3  col4  col5  col6  Class
0    22    23.40   45.60  11    1.0   0.0    0.0
1   346    67.40  235.60  23    1.0   1.0    0.0
2    22    67.34  364.66  17    0.0   0.0    1.0
3  1231   124.44  213.89  14    1.0   0.0    1.0

col1 and col4 are discrete variables. col2 and col3 are continuous variables. col5 and col6 are categorical variables. Class is the target variable.

I was wondering if I can pass along the above data directly as input to the placeholder X.

X = tf.placeholder(tf.float32, [None, numFeatures])

I do not have to apply tf.one_hot, correct? Since my categorical variables are binary.

How does tensorflow detect that col5 and col6 are categorical variables?

Any help would be appreciated. Thank you!


Solution

  • Since your variables are binary it's ok to treat them as int you have to create placeholders that you will later use during the training part by passing batches.

    Here is how you could declare your tensorflow placeholders so that they have the right dtype.

    var1 = tf.placeholder(tf.int32, shape)
    var4 = tf.placeholder(tf.int32, shape)
    
    var2 = tf.placeholder(tf.float32, shape)
    var3 = tf.placeholder(tf.float32, shape)
    
    var5 = tf.placeholder(tf.int32, shape)
    var6 = tf.placeholder(tf.int32, shape)
    
    class_ = tf.placeholder(tf.int32, shape)
    

    In order for you to feed the set of variables to a model you will later have to concatenate them but before that you should cast your tensor in order to have all in the same dtypes for concatenation.

    var1 = tf.cast(var1, tf.float32)
    ...
    data = tf.concat([var1,var4, var2,var3, var5, var6], axis=1)