Search code examples
pythontensorflow

Tensorflow create a tfrecords file from csv


I am trying to write a csv file (all columns are floats) to a tfrecords file then read them back out. All the examples I have seen pack the csv columns then feed it to sess.run() directly but I can't figure out how to write the feature columns and label column to a tfrecord instead. How could I do this?


Solution

  • You will need a separate script to convert your csv file to TFRecords.

    Imagine you have a CSV with the following header:

    feature_1, feature_2, ..., feature_n, label
    

    You need to read your CSV with something like pandas, construct tf.train.Example manually and then write it to file with TFRecordWriter

    csv = pandas.read_csv("your.csv").values
    with tf.python_io.TFRecordWriter("csv.tfrecords") as writer:
        for row in csv:
            features, label = row[:-1], row[-1]
            example = tf.train.Example()
            example.features.feature["features"].float_list.value.extend(features)
            example.features.feature["label"].int64_list.value.append(label)
            writer.write(example.SerializeToString())