I'm facing BrokenPipeError when I'm trying to run sentiment analysis with hugging face. It's returning [Error No] 32 Broken Pipe. Is there any way to rewrite the next(iter(train_data_loader)) code?
Link with total code 'https://colab.research.google.com/drive/1wBXKa-gkbSPPk-o7XdwixcGk7gSHRMas?usp=sharing'
The code is
def create_data_loader(df, tokenizer, max_len, batch_size):
ds = GPReviewDataset(
reviews=df.content.to_numpy(),
targets=df.sentiment.to_numpy(),
tokenizer=tokenizer,
max_len=max_len
)
return DataLoader(
ds,
batch_size=batch_size,
num_workers=4
)
Followed by below code
BATCH_SIZE = 16
train_data_loader = create_data_loader(df_train, tokenizer, MAX_LEN, BATCH_SIZE)
val_data_loader = create_data_loader(df_val, tokenizer, MAX_LEN, BATCH_SIZE)
test_data_loader = create_data_loader(df_test, tokenizer, MAX_LEN, BATCH_SIZE)
Followed by
data = next(iter(train_data_loader))
data.keys()
I'm facing error with this 'data = next(iter(train_data_loader))
' code
Error is BrokenPipeError: [Errno 32] Broken pipe
One of the reason of this issue might be the OS. When you're using Windows, you should not define num_worker
, because PyTorch dataloader does not support multi-processing on Windows. By default num_workers
is 0
and works on Windows.
DataLoader(
ds,
batch_size=batch_size,
num_workers=0 # should be zero on Windows
)