Search code examples
tensorflowdatasetpytorchhdf5tfrecord

How to make a dataset from video datasets(tensorflow first)


everyone.

Now I have an object classification task, and I have a dataset containing a large number of videos. In every video, some frames(not every frame, about 160 thousand frames) have its labels, since a frame may have multiple objects.

I have some confusion about creating the dataset. My idea is to convert videos to frames firstly, then every frame only with labels will be made as tfrecord or hdf5 format. Finally, I would write every frame's path into csv files (training and validation) using for my task.

My question is : 1. Is there efficient enough(tfrecord or hdf5)? Should I preprocess every frame such as compression for save the storage space before creating tfrecord or hdf5 files? 2. Is there a way to handle the video dataset directly in tensorflow or pytorch?

I want to find an efficient and conventional way to handle video datasets. Really looking forward to every answer.


Solution

  • I am no TensorFlow guy, so my answer won't cover that, sorry.

    Video formats generally gain compression at the cost of longer random-access times thanks to exploiting temporal correlations in the data. It makes sense because one usually accesses video frames sequentially, but if your access is entirely random I suggest you convert to hdf5. Otherwise, if you access sub-sequences of video, it may make sense to stay with video formats.

    PyTorch does not have any "blessed" approaches to video AFAIK, but I use imageio to read videos and seek particular frames. A short wrapper makes it follow the PyTorch Dataset API. The code is rather simple but has a caveat, which is necessary to allow using it with multiprocessing DataLoader.

    import imageio, torch
    
    class VideoDataset:
        def __init__(self, path):
            self.path = path
    
            # explained in __getitem__
            self._reader = None
    
            reader = imageio.get_reader(self.path, 'ffmpeg')
            self._length = reader.get_length()
    
        def __getitem__(self, ix):
            # Below is a workaround to allow using `VideoDataset` with
            # `torch.utils.data.DataLoader` in multiprocessing mode.
            # `DataLoader` sends copies of the `VideoDataset` object across
            # processes, which sometimes leads to bugs, as `imageio.Reader`
            # does not support being serialized. Since our `__init__` set
            # `self._reader` to None, it is safe to serialize a
            # freshly-initialized `VideoDataset` and then, thanks to the if
            # below, `self._reader` gets initialized independently in each
            # worker thread.
    
            if self._reader is None:
                self._reader = imageio.get_reader(self.path, 'ffmpeg')
    
            # this is a numpy ndarray in [h, w, channel] format
            frame = self._reader.get_data(ix)
    
            # PyTorch standard layout [channel, h, w]
            return torch.from_numpy(frame.transpose(2, 0, 1))
    
         def __len__(self):
            return self.length
    

    This code can be adapted to support multiple video files as well as to output the labels as you would like to have them.