Search code examples
pythoniteratorpytorchdataloader

How to make PyTorch DataLoader from scratch?


Is it possible to recreate a simple version of PyTorch DataLoader from scratch? The class should be able to return current batch based on the batch size.

For example, the code bellow only allows me to return one example at the time

X = np.array([[1,2],[3,4],[5,6],[6,7]])

class DataLoader:
    def __init__(self, X, b_size):
        self.X = X
        self.b_size = b_size
    
    def __len__(self):
        return len(self.X)
    
    def __getitem__(self, index):
        return self.X[index]

But what I want to achieve is that if I specify b_size=2, it would return:

Iteration 0: [[1,2],[3,4]]
Iteration 1: [[5,6],[7,8]]

Is it possible to do something like that in Python? I can't use the DataLoader class.


Solution

  • X = np.array([[1,2],[3,4],[5,6],[6,7]])
    
    class DataLoader:
        def __init__(self, X, b_size):
            self.X = X
            self.b_size = b_size
        
        def __len__(self):
            return len(self.X)//self.b_size
        
        def __getitem__(self, index):        
            return self.X[index*self.b_size:index*self.b_size+self.b_size]
    
    d = DataLoader(X, 2)
    for i in range(len(d)):
      print (f"Iteration {i}: {d[i]}")
    

    Output:

    Iteration 0: [[1 2]
     [3 4]]
    Iteration 1: [[5 6]
     [6 7]]