python machine-learning tensorflow word2vec

Word2vec + Tensorflow and the shape of everything

i'm looking for a solution to a simple Text classification issue with tensorflow. I made a model with the IMDB dataset to know wether a comment is positive or negative. The data was processed through word2vec so now i have a bunch of vector to classify. I think my problem here are due to the bad shape of the y_labels since they are one dimensionnal and i want to classify them through tensorflow with a two classes output, or maybe i am wrong. Final info, the model is working well, with an accuracy of 1.0, maybe too well! Thanks for the help !

X_train called train_vecs = (25000, 300) dtype: float64
X_test called test_vecs = (25000, 300) dtype: float64
y_test = shape (25000, 1) dtype: int64
y_train = shape: (25000, 1) dtype: int64

x = tf.placeholder(tf.float32, shape = [None, 300])
y = tf.placeholder(tf.float32, shape = [None, 2])
# Input -> Layer 1
W1 = tf.Variable(tf.zeros([300, 2]))
b1 = tf.Variable(tf.zeros([2]))
#h1 = tf.nn.sigmoid(tf.matmul(x, W1) + b1)
# Calculating difference between label and output
pred = tf.nn.softmax(tf.matmul(x, W1) + b1)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred,y))
train_step = tf.train.GradientDescentOptimizer(0.3).minimize(cost)

with tf.Session() as sess:
        for i in xrange(200):
                init_op = tf.initialize_all_variables()
                sess.run(init_op)
                train_step.run(feed_dict = {x: train_vecs, y: y_train})
        correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
        # Calculate accuracy
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
        print "Accuracy:", accuracy.eval({x: test_vecs, y: y_test})

Solution

You are using softmax in your example. Softmax assigns a probability to N different classes, where the probability adds up to 1. Basically, the model is choosing exactly one of the N choices. For this to make sense you need N to be at least 2. With N == 1, the probability of that class will always be 1. You have two possible fixes:

Create two classes, one for "positive sentiment" and one for "negative sentiment", setting N to 2.
Use logistic regression instead of Softmax. In logistic regression each class is independent. That means you have N questions, each of which gets its own "yes" or "no" answer, which makes sense with N == 1.