I am trying XLnet over Jigsaw toxic dataset.
When I train my data with
input_ids = d["input_ids"].reshape(4,512).to(device) # batch size x seq length
it trains perfectly. But when I try to test the model with test data with reshaping the input_ids in the same way, it generates a run time error:
shape '[4, 512]' is invalid for input of size 1024
This is the method I am using for training:
def train_epoch(model, data_loader, optimizer, device, scheduler, n_examples):
model = model.train()
losses = []
acc = 0
counter = 0
for d in data_loader:
input_ids = d["input_ids"].reshape(4,512).to(device)
attention_mask = d["attention_mask"].to(device)
targets = d["targets"].to(device)
outputs = model(input_ids=input_ids, token_type_ids=None, attention_mask=attention_mask, labels = targets)
loss = outputs[0]
logits = outputs[1]
# preds = preds.cpu().detach().numpy()
_, prediction = torch.max(outputs[1], dim=1)
targets = targets.cpu().detach().numpy()
prediction = prediction.cpu().detach().numpy()
accuracy = metrics.accuracy_score(targets, prediction)
acc += accuracy
losses.append(loss.item())
loss.backward()
nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
optimizer.step()
scheduler.step()
optimizer.zero_grad()
counter = counter + 1
return acc / counter, np.mean(losses)
This is the method I am using for evaluating my test data:
def eval_model(model, data_loader, device, n_examples):
model = model.eval()
losses = []
acc = 0
counter = 0
with torch.no_grad():
for d in data_loader:
# print(d["input_ids"])
input_ids = d["input_ids"].reshape(4,512).to(device)
attention_mask = d["attention_mask"].to(device)
targets = d["targets"].to(device)
outputs = model(input_ids=input_ids, token_type_ids=None, attention_mask=attention_mask, labels = targets)
loss = outputs[0]
logits = outputs[1]
_, prediction = torch.max(outputs[1], dim=1)
targets = targets.cpu().detach().numpy()
prediction = prediction.cpu().detach().numpy()
accuracy = metrics.accuracy_score(targets, prediction)
acc += accuracy
losses.append(loss.item())
counter += 1
return acc / counter, np.mean(losses)
And when I try to run the eval_model method with my test data, it generates a run time error.
My model info:
I am unable to understand what wrong I am doing. Can anyone please help me out with this? Thank you.
I think the problem is that the training dataset's d['input_ids']
was of size 4*512 = 2048 so it could be divided into 4 and 512.
But the testing dataset's d['input_ids']
is of size 1024, which cannot be divided into 4 and 512.
Since you haven't given the model
description, i can't say if you should change it to (-1, 512) or (4, -1) [using -1 in reshape tells numpy to figure that dimension out automatically.
e.g. reshaping an array of 2048 elements into (4, 512) can be done by reshape(4,512)
and reshape(-1, 512)
and reshape(4, -1)
as well.