Search code examples
pythontensorflowtensorflow2.0tensorflow-datasets

Saving and opening a tensorflow dataset


I have created and saved a dataset which looks like this:

# line 1
foo+++$+++faa+++$+++fee
# +++$+++ is the separator

I've saved like a .txt and then saved to tf with

from tensorflow.data import TextLineDataset
from tensorflow.data.experimental import save, load
tfsaved = TextLineDataset('path_to_file.txt')
save(tfsaved, 'path_tf_dataset')

But, when I load the dataset, it looks like this:

# Line 1
foofaafee

Can I, in any way, show to tf that +++$+++ is my separator? If not, how can I solve this?


Solution

  • Here is a simple example of how you can read your data using pandas and pass it to tf.data.Dataset.from_tensor_slices:

    data.csv

    feature1+++$+++feature2+++$+++feature3
    foo+++$+++faa+++$+++fee
    foo+++$+++faa+++$+++fee
    foo+++$+++faa+++$+++fee
    foo+++$+++faa+++$+++fee
    foo+++$+++faa+++$+++fee
    foo+++$+++faa+++$+++fee
    foo+++$+++faa+++$+++fee
    
    import pandas as pd 
    import tensorflow as tf
    
    df =  pd.read_csv('data.csv', sep='\+\+\+\$\+\+\+', engine='python')
    ds = tf.data.Dataset.from_tensor_slices((dict(df)))
    
    for d in ds.take(3):
      tf.print(d)
    
    {'feature1': "foo", 'feature2': "faa", 'feature3': "fee"}
    {'feature1': "foo", 'feature2': "faa", 'feature3': "fee"}
    {'feature1': "foo", 'feature2': "faa", 'feature3': "fee"}
    

    Note that I had to escape the characters + and $, since they are special regex characters.