regression classification binary-data nlp

Logistic Regression gradually tends to predict all 0s while training on mini batches

I am using mini-batches to train my model which is as follows

SimpleModel(
  (embed): Embedding(vocab_size, embedding_size, max_norm=2)
  (model): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=in_features, out_features=1, bias=True)
  )
  (sig): Sigmoid()
)

these are the specifics of the model and upon training through mini-batches, after 2-3 minibatches all the outputs become 0. training function looks like this while the training loop is usual

def trainer(train_loader, model, optimizer, criterion):
    model.train()
    it_loss = 0
    counter = 0
    for data in train_loader:
        optimizer.zero_grad()
        msgs = data['msg']
        targets = data['target']
        out = model(msgs)
        print(out)
        loss = criterion(out, targets)
        loss.backward()
        optimizer.step()
        
        it_loss+=loss.item()*msgs.shape[0]
        counter+=msgs.shape[0]
        
    return it_loss/counter

I have tried using various optimizer and all those things, data is not imbalanced as shown

0    3900
1    1896
Name: count, dtype: int64

what could be the possible reason and how can I solve it

Edit :

The output of first mini batch looks like

tensor([[0.4578],
        [0.4569],
        [0.4686],
            .
            .
            .
        [0.4602],
        [0.4674],
        [0.4398]], grad_fn=<SigmoidBackward0>)

while output of 4th or 5th mini batch looks like

tensor([[0.0057],
        [0.0058],
        [0.0058],
            .
            .
            .
        [0.0058],
        [0.0057],
        [0.0059]], grad_fn=<SigmoidBackward0>)

And furthermore it gradually becomes exact 0.

Solution

Changed the optimizer and Loss function

Used the RAdam optimizer and changed loss function to BCELoss and it worked