Search code examples
pandascsvdataframetensorflow

Converting tensorflow dataset to pandas dataframe


I am very new to the deep learning and computer vision. I want to do some face recognition project. For that I downloaded some images from Internet and converted to Tensorflow dataset by the help of this article from tensorflow documentation. Now I want to convert that dataset to pandas dataframe in order to convert that to csv files. I tried a lot but am unable to do it. Can someone help me with it. Here is the code for making datasets and and then some of the wrong code which I tried for this.

import tensorflow as tf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


filenames = tf.constant(['al.jpg', 'al2.jpg', 'al3.jpg', 'al4.jpeg','al5.jpeg', 'al6.jpeg','al7.jpg','al8.jpeg', '5.jpg', 'hrit8.jpeg', 'Hrithik-Roshan.jpg', 'Hrithik.jpg', 'hriti1.jpeg', 'hriti2.jpg', 'hriti3.jpeg', 'hritik4.jpeg', 'hritik5.jpg', 'hritk9.jpeg', 'index.jpeg', 'sah.jpeg', 'sah1.jpeg', 'sah3.jpeg', 'sah4.jpg', 'sah5.jpg','sah6.jpg','sah7.jpg'])
labels = tf.constant([1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 2, 2, 2, 2, 2])
dataset = tf.data.Dataset.from_tensor_slices((filenames, labels))


def _parse_function(filename, label):
     image_string = tf.read_file(filename)
     image_decoded = tf.image.decode_jpeg(image_string,channels=3)
     image_resized = tf.image.resize_images(image_decoded, [28, 28])
     return image_resized, label
dataset = dataset.map(_parse_function)
dataset = dataset.shuffle(buffer_size=100)
dataset = dataset.batch(26)
iterator = dataset.make_one_shot_iterator()
image,labels = iterator.get_next()

sess = tf.Session()

print(sess.run([image, labels]))

Initially I just tried to use df = pd.DataFrame(dataset)

Then i got following error:

enter code here
ValueError                                Traceback (most recent call last)
<ipython-input-15-d5503ae4603d> in <module>()
----> 1 df = pd.DataFrame((dataset))

 ~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
402                                          dtype=values.dtype, copy=False)
403             else:
--> 404                 raise ValueError('DataFrame constructor not properly called!')
405 
406         NDFrame.__init__(self, mgr, fastpath=True)

ValueError: DataFrame constructor not properly called!

Thereafter I came across this article I got my mistake that in tensorflow anything exist only within a session. So I tried following code:

with tf.Session() as sess:
df = pd.DataFrame(sess.run(dataset))

Please pardon me if i did stupidest mistake because i wrote above code from this analogy print(sess.run(dataset)) and got a much bigger error:

 TypeError: Fetch argument <BatchDataset shapes: ((?, 28, 28, 3), (?,)), types: (tf.float32, tf.int32)> has invalid type <class 'tensorflow.python.data.ops.dataset_ops.BatchDataset'>, must be a string or Tensor. (Can not convert a BatchDataset into a Tensor or Operation.)

Solution

  • I think you could use map like this. I assumed that you want to add a numpy array to a data frame as described here. But you have to append one by one and also figure out how this whole array fits in one column of the data frame.

    import tensorflow as tf
    import pandas as pd
    
    
    filenames = tf.constant(['C:/Machine Learning/sunflower/50987813_7484bfbcdf.jpg'])
    labels = tf.constant([1])
    dataset = tf.data.Dataset.from_tensor_slices((filenames, labels))
    
    sess = tf.Session()
    
    def convert_to_dataframe(filename, label):
        print ( pd.DataFrame.from_records(filename))
        return filename, label
    
    
    def _parse_function(filename, label):
         image_string = tf.read_file(filename)
         image_decoded = tf.image.decode_jpeg(image_string,channels=3)
         image_resized = tf.image.resize_images(image_decoded, [28, 28])
         return image_resized, label
    
    dataset = dataset.map(_parse_function)
    dataset = dataset.map( lambda filename, label: tf.py_func(convert_to_dataframe,
                                                              [filename, label],
                                                              [tf.float32,tf.int32]))
    
    dataset = dataset.shuffle(buffer_size=100)
    dataset = dataset.batch(26)
    iterator = dataset.make_one_shot_iterator()
    image,labels = iterator.get_next()
    
    
    sess.run([image, labels])