Search code examples
tensorflowtraining-datakaggle

Tensorflow doesn't train: 'DataFrame' objects are mutable, thus they cannot be hashed


I want to build and train a neural network with tensorflow (but without Keras, on Keras it I got it working) on the kaggle dataset 'House Prices'. I use Python and apart from the actual training, my code runs fine. However, when training, I either get no error (but it doesn't train), or I get a TypeError: 'DataFrame' objects are mutable, thus they cannot be hashed.

I run the script on Google's colab in a ipynotebook and I believe that the main issue is in entering the feed_dict. However, I don't know what is wrong here. The batch_X contains 100x10 features, and the batch_Y has 100 labels. I guess that this might be the critical snipplet:

train_data = { X: batch_X, Y_:batch_Y }

The train_data is what I feed to sess.run(train_step, feed_dict=train_data")

Here's my code: https://colab.research.google.com/drive/1qabmzzicZVu7v72Be8kljM1pUaglb1bY

# train and train_normalized are the training data set (DataFrame)
# train_labels_normalized are the labels only

#Start session:
with tf.Session() as sess:
  sess.run(init)

  possible_indeces = list(range(0, train.shape[0]))
  iterations = 1000
  batch_size = 100

  for step in range(0, iterations):
    #draw batch indeces:
    batch_indeces = random.sample(possible_indeces, batch_size)
    #get features and respective labels
    batch_X = np.array(train_normalized.iloc[batch_indeces])
    batch_Y = np.array(train_labels_normalized.iloc[batch_indeces])

    train_data = { X: batch_X, Y_: batch_Y}

    sess.run(train_step, feed_dict=train_data)

What I was hoping for is that it would run for a couple of minutes and return with optimized Weights (2 hidden layers with 48 nodes each) allowing me to make predictions. However, it simply skips over the above code or throws the error belo.

Does anyone have an idea what went wrong?

TypeError Traceback (most recent call last)
<ipython-input-536-79506f90a868> in <module>()
     13     batch_Y = p.array(train_labels_normalized.iloc[batch_indeces])
     14 
---> 15     train_data = { X: batch_X, Y_: batch_Y}
     16 
     17     sess.run(train_step, feed_dict=train_data)

  /usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in __hash__(self)

   1814  def __hash__(self):
   1815  raise TypeError('{0!r} objects are mutable, thus they cannot be'
-> 1816     ' hashed'.format(self.__class__.__name__))
   1817
   1818     def __iter__(self):

  TypeError: 'DataFrame' objects are mutable, thus they cannot be hashed

Solution

  • The problem derives from your seventh(Test) step.

    #Set X to the test data
    X = test_normalized.astype(np.float32)
    print(type(X)) # **<class 'pandas.core.frame.DataFrame'>**
    Y1 = tf.nn.sigmoid(tf.matmul(X, W1))
    Y2 = tf.nn.sigmoid(tf.matmul(Y1, W2))
    Y3 = tf.matmul(Y2, W3)
    

    You are setting X to a DataFrame. On the first run this does not affect anything. But, when you run sixth step after seventh you run into this problem because you have overwritten contents of X.

    Try changing X to X_:

    X_ = test_normalized.astype(np.float32)
    Y1 = tf.nn.sigmoid(tf.matmul(X_, W1))
    

    Also, your final eval does not work. Get it into a tf.Session.