Search code examples
pythonloopspytorchiteratorbroken-pipe

Facing Broken Pipe error when trying to run the next(iter(train_data_loader)). I'm running the code in local jupyter notebook


I'm facing BrokenPipeError when I'm trying to run sentiment analysis with hugging face. It's returning [Error No] 32 Broken Pipe. Is there any way to rewrite the next(iter(train_data_loader)) code?

Link with total code 'https://colab.research.google.com/drive/1wBXKa-gkbSPPk-o7XdwixcGk7gSHRMas?usp=sharing'

The code is

def create_data_loader(df, tokenizer, max_len, batch_size):
  ds = GPReviewDataset(
    reviews=df.content.to_numpy(),
    targets=df.sentiment.to_numpy(),
    tokenizer=tokenizer,
    max_len=max_len
  )
  return DataLoader(
    ds,
    batch_size=batch_size,
    num_workers=4
  )

Followed by below code

BATCH_SIZE = 16
train_data_loader = create_data_loader(df_train, tokenizer, MAX_LEN, BATCH_SIZE)
val_data_loader = create_data_loader(df_val, tokenizer, MAX_LEN, BATCH_SIZE)
test_data_loader = create_data_loader(df_test, tokenizer, MAX_LEN, BATCH_SIZE)

Followed by

data = next(iter(train_data_loader))
data.keys()

I'm facing error with this 'data = next(iter(train_data_loader))' code

Error is BrokenPipeError: [Errno 32] Broken pipe


Solution

  • One of the reason of this issue might be the OS. When you're using Windows, you should not define num_worker, because PyTorch dataloader does not support multi-processing on Windows. By default num_workers is 0 and works on Windows.

    DataLoader(
        ds,
        batch_size=batch_size,
        num_workers=0 # should be zero on Windows
      )