Search code examples
pythontensorflowtensorflow2.0tfrecord

How to write a wav to a tfrecord and then read it back


I'm trying to write an encoded wav to a tfrecord and then read it back. I know I can write the wav as a normal tensor, but am trying to save space.

I'd like to do something like the following, but am unsure how to fill in the ellipses. In particular, I don't know if I should save as an int64 feature or a bytes feature.

def wav_feature(wav):
    value = tf.audio.encode_wav(wav, 44100)
    return tf.train.Feature(...)

example = tf.train.Example(features=tf.train.Features(feature={
    'foo': wav_feature(wav),
}))

with tf.io.TFRecordWriter(outpath) as writer:
    writer.write(example.SerializeToString())

# In parser

features = tf.io.parse_single_example(
            serialized=proto,
            features={'foo': tf.io.FixedLenFeature([], ...)})

decoded, sr = tf.audio.decode_wav(features['foo'])

Solution

  • It looks like encode_wav returns a string tensor, so using a bytes feature is best:

    def _bytes_feature(value):                                                      
      """Returns a bytes_list from a string / byte."""                              
      if isinstance(value, type(tf.constant(0))):                                   
        value = value.numpy() # BytesList won't unpack a string from an EagerTensor.
      return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))         
    
    # Convert to a string tensor.
    wav_encoded = tf.audio.encode_wav(wav, 44100)
    
    feature = {'foo': _bytes_feature(wav_encoded)}         
    example = tf.train.Example(features=tf.train.Features(feature=feature))      
    

    Then, in the parser:

    features = tf.io.parse_single_example(
            example.SerializeToString(),                 
            features={'foo': tf.io.FixedLenFeature([], tf.string)})               
    # wav_encoded will be a string tensor. 
    wav_encoded = features['foo']
    

    Definition of _bytes_feature is here.