python neural-network nlp bert-language-model

RuntimeError: shape '[4, 512]' is invalid for input of size 1024 while while evaluating test data

I am trying XLnet over Jigsaw toxic dataset.

When I train my data with

input_ids = d["input_ids"].reshape(4,512).to(device)  # batch size x seq length

it trains perfectly. But when I try to test the model with test data with reshaping the input_ids in the same way, it generates a run time error:

shape '[4, 512]' is invalid for input of size 1024

This is the method I am using for training:

def train_epoch(model, data_loader, optimizer, device, scheduler, n_examples):
    model = model.train()
    losses = []
    acc = 0
    counter = 0
  
    for d in data_loader:
        input_ids = d["input_ids"].reshape(4,512).to(device)
        attention_mask = d["attention_mask"].to(device)
        targets = d["targets"].to(device)
        
        outputs = model(input_ids=input_ids, token_type_ids=None, attention_mask=attention_mask, labels = targets)
        loss = outputs[0]
        logits = outputs[1]

        # preds = preds.cpu().detach().numpy()
        _, prediction = torch.max(outputs[1], dim=1)
        targets = targets.cpu().detach().numpy()
        prediction = prediction.cpu().detach().numpy()
        accuracy = metrics.accuracy_score(targets, prediction)

        acc += accuracy
        losses.append(loss.item())
        
        loss.backward()

        nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
        optimizer.step()
        scheduler.step()
        optimizer.zero_grad()
        counter = counter + 1

    return acc / counter, np.mean(losses)

This is the method I am using for evaluating my test data:

def eval_model(model, data_loader, device, n_examples):
    model = model.eval()
    losses = []
    acc = 0
    counter = 0
  
    with torch.no_grad():
        for d in data_loader:
            # print(d["input_ids"])
            input_ids = d["input_ids"].reshape(4,512).to(device)
            attention_mask = d["attention_mask"].to(device)
            targets = d["targets"].to(device)
            
            outputs = model(input_ids=input_ids, token_type_ids=None, attention_mask=attention_mask, labels = targets)
            loss = outputs[0]
            logits = outputs[1]

            _, prediction = torch.max(outputs[1], dim=1)
            targets = targets.cpu().detach().numpy()
            prediction = prediction.cpu().detach().numpy()
            accuracy = metrics.accuracy_score(targets, prediction)

            acc += accuracy
            losses.append(loss.item())
            counter += 1

    return acc / counter, np.mean(losses)

And when I try to run the eval_model method with my test data, it generates a run time error.

My model info:

I am unable to understand what wrong I am doing. Can anyone please help me out with this? Thank you.

Solution

I think the problem is that the training dataset's d['input_ids'] was of size 4*512 = 2048 so it could be divided into 4 and 512. But the testing dataset's d['input_ids'] is of size 1024, which cannot be divided into 4 and 512.

Since you haven't given the model description, i can't say if you should change it to (-1, 512) or (4, -1) [using -1 in reshape tells numpy to figure that dimension out automatically.

e.g. reshaping an array of 2048 elements into (4, 512) can be done by reshape(4,512) and reshape(-1, 512) and reshape(4, -1) as well.