I am new to tensorflow. I have a dataset that has continuous, discrete and categorical values. The sample data is as follows:
col1 col2 col3 col4 col5 col6 Class
0 22 23.40 45.60 11 1.0 0.0 0.0
1 346 67.40 235.60 23 1.0 1.0 0.0
2 22 67.34 364.66 17 0.0 0.0 1.0
3 1231 124.44 213.89 14 1.0 0.0 1.0
col1 and col4 are discrete variables. col2 and col3 are continuous variables. col5 and col6 are categorical variables. Class is the target variable.
I was wondering if I can pass along the above data directly as input to the placeholder X
.
X = tf.placeholder(tf.float32, [None, numFeatures])
I do not have to apply tf.one_hot
, correct? Since my categorical variables are binary.
How does tensorflow detect that col5 and col6 are categorical variables?
Any help would be appreciated. Thank you!
Since your variables are binary it's ok to treat them as int
you have to create placeholders that you will later use during the training part by passing batches.
Here is how you could declare your tensorflow placeholders so that they have the right dtype.
var1 = tf.placeholder(tf.int32, shape)
var4 = tf.placeholder(tf.int32, shape)
var2 = tf.placeholder(tf.float32, shape)
var3 = tf.placeholder(tf.float32, shape)
var5 = tf.placeholder(tf.int32, shape)
var6 = tf.placeholder(tf.int32, shape)
class_ = tf.placeholder(tf.int32, shape)
In order for you to feed the set of variables to a model you will later have to concatenate them but before that you should cast your tensor in order to have all in the same dtypes for concatenation.
var1 = tf.cast(var1, tf.float32)
...
data = tf.concat([var1,var4, var2,var3, var5, var6], axis=1)