At the moment my model gives 3 output tensors. I want two of them to be more cooperative. I want to use the combination of self.dropout1(hs) and self.dropout2(cls_hs) to pass through the self.entity_out Linear Layer. The issue is mentioned 2 tensors are in different shapes.
Current Code
class NLUModel(nn.Module):
def __init__(self, num_entity, num_intent, num_scenarios):
super(NLUModel, self).__init__()
self.num_entity = num_entity
self.num_intent = num_intent
self.num_scenario = num_scenarios
self.bert = transformers.BertModel.from_pretrained(config.BASE_MODEL)
self.dropout1 = nn.Dropout(0.3)
self.dropout2 = nn.Dropout(0.3)
self.dropout3 = nn.Dropout(0.3)
self.entity_out = nn.Linear(768, self.num_entity)
self.intent_out = nn.Linear(768, self.num_intent)
self.scenario_out = nn.Linear(768, self.num_scenario)
def forward(self, ids, mask, token_type_ids):
out = self.bert(input_ids=ids, attention_mask=mask,
token_type_ids=token_type_ids)
hs, cls_hs = out['last_hidden_state'], out['pooler_output']
entity_hs = self.dropout1(hs)
intent_hs = self.dropout2(cls_hs)
scenario_hs = self.dropout3(cls_hs)
entity_hs = self.entity_out(entity_hs)
intent_hs = self.intent_out(intent_hs)
scenario_hs = self.scenario_out(scenario_hs)
return entity_hs, intent_hs, scenario_hs
Required
def forward(self, ids, mask, token_type_ids):
out = self.bert(input_ids=ids, attention_mask=mask,
token_type_ids=token_type_ids)
hs, cls_hs = out['last_hidden_state'], out['pooler_output']
entity_hs = self.dropout1(hs)
intent_hs = self.dropout2(cls_hs)
scenario_hs = self.dropout3(cls_hs)
entity_hs = self.entity_out(concat(entity_hs, intent_hs)) # Concatination
intent_hs = self.intent_out(intent_hs)
scenario_hs = self.scenario_out(scenario_hs)
return entity_hs, intent_hs, scenario_hs
Let's say I was successful in concatenating... will the backward propagation work?
Shape of entity_hs (last_hidden_state) is [batch_size, sequence_length, hidden_size], and shape of intent_hs (pooler_output) is just [batch_size, hidden_size] and putting them together may not make sense. It depends on what you want to do.
If, for some reason, you want to get output [batch_size, sequence_length, channels], you could tile the intent_hs tensor:
intent_hs = torch.tile(intent_hs[:, None, :], (1, sequence_lenght, 1))
... = torch.cat([entity_hs, intent_hs], dim=2)
If you want to get [batch_size, channels], you can reduce the entity_hs tensor for example by averaging:
entity_hs = torch.mean(entity_hs, dim=1)
... = torch.cat([entity_hs, intent_hs], dim=1)
Yes, the the backward pass will propagate gradients through the concatenation (and the rest).