Search code examples
pythontensorflowbluedata

How to read and write from datatap using Tensorflow on BlueData?


I want to be able to use BlueData's datatap directly from TensorFlow.

With pyspark, I can do something like this:

df.write.parquet('dtap://OtherDataTap/airline-safety_zero_incidents.parquet')

Note that I don't need to set up any libraries - it's ready to go out of the box.

What do I need to do for reading/writing data over DataTap from Tensorflow?


Solution

  • As per the docs: http://docs.bluedata.com/40_datatap-tensorflow-support

    import tensorflow as tf
    import os
    from tensorflow.python.framework.versions import CXX11_ABI_FLAG
    
    CXX11_ABI_FLAG
    
    bdfs_file_system_library = os.path.join("/opt/bluedata","libbdfs_file_system_shared_r1_9.so")
    tf.load_file_system_library(bdfs_file_system_library)
    
    with tf.gfile.Open("dtap://TenantStorage/tmp/tensorflow/dtap.txt", 'w') as f:
        f.write("This is the dtap test file")
    
    with tf.gfile.Open("dtap://TenantStorage/tmp/tensorflow/dtap.txt", 'r') as f:
        content = f.read()