Search code examples
tensorflowembedding

Tensorflow Embedding using Continous and Categorical Variable


Based on this post, I tried to create another model, where I'm adding both categorical and continous variables. Please find the code below:

from __future__ import print_function
import pandas as pd; 
import tensorflow as tf
import numpy as np
from sklearn.preprocessing import LabelEncoder

if __name__  == '__main__':

    # 1 categorical input feature and a binary output
    df = pd.DataFrame({'cat2': np.array(['o', 'm', 'm', 'c', 'c', 'c', 'o', 'm', 'm', 'm']),
                       'num1': np.random.rand(10),
                       'label': np.array([0, 0, 1, 1, 0, 0, 1, 0, 1, 1])})

    encoder = LabelEncoder()
    encoder.fit(df.cat2.values)

    X1 = encoder.transform(df.cat2.values).reshape(-1,1)
    X2 = np.array(df.num1.values).reshape(-1,1)
#     X = np.concatenate((X1,X2), axis=1)
    Y = np.zeros((len(df), 2))
    Y[np.arange(len(df)), df.label.values] = 1

    # Neural net parameters
    training_epochs = 5
    learning_rate = 1e-3
    cardinality = len(np.unique(X))
    embedding_size = 2
    input_X_size = 1
    n_labels = len(np.unique(Y))
    n_hidden = 10

    # Placeholders for input, output
    cat2 = tf.placeholder(tf.int32, [None], name='cat2')
    x = tf.placeholder(tf.float32, [None, 1], name="input_x")
    y = tf.placeholder(tf.float32, [None, 2], name="input_y")

    embed_matrix = tf.Variable(
                tf.random_uniform([cardinality, embedding_size], -1.0, 1.0),
                name="embed_matrix"
            )
    embed = tf.nn.embedding_lookup(embed_matrix, cat2)

    inputs_with_embed = tf.concat([x, embedding_aggregated], axis=2, name="inputs_with_embed")

    # Neural network weights

    h = tf.get_variable(name='h2', shape=[inputs_with_embed, n_hidden],
                        initializer=tf.contrib.layers.xavier_initializer())
    W_out = tf.get_variable(name='out_w', shape=[n_hidden, n_labels],
                            initializer=tf.contrib.layers.xavier_initializer())

    # Neural network operations
    #embedded_chars = tf.nn.embedding_lookup(embeddings, x)

    layer_1 = tf.matmul(inputs_with_embed,h)
    layer_1 = tf.nn.relu(layer_1)
    out_layer = tf.matmul(layer_1, W_out)

    # Define loss and optimizer
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=out_layer, labels=y))
    optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

    # Initializing the variables
    init = tf.global_variables_initializer()

    # Launch the graph
    with tf.Session() as sess:
        sess.run(init)

        for epoch in range(training_epochs):
            avg_cost = 0.

            # Run optimization op (backprop) and cost op (to get loss value)
            _, c = sess.run([optimizer, cost],
                             feed_dict={x: X2,cat2:X1, y: Y})
    print("Optimization Finished!")

But I'm getting the following error. It seems I'm not concatenating the continous variable and embedding properly. But I'm not understanding how to fix it.

Please if someone can please guide me.

ValueError: Shape must be at least rank 3 but is rank 2 for 'inputs_with_embed_2' (op: 'ConcatV2') with input shapes: [?,1], [?,2], [] and with computed input tensors: input[2] = <2>.

Thanks!


Solution

  • If by embedding_agregated you mean embed (probably typo)

    The error is that there is no axis=2 in your case , it should be axis=1

    inputs_with_embed = tf.concat([x, embed], axis=1, name="inputs_with_embed")

    embed has a shape [None, embedding_dimension] and x has a shape [None, 1]

    They are both 2D tensors, so you have access to axis=0 or axis=1 (indexing at 0 not 1), therefore to have your input_with_embed of shape [None, embedding_dimension+1] you need to concat on the axis=1