tensorflow keras nlp embedding word-embedding

How to change the tensor shape in middle layers?

Saying I have a 2000x100 matrix, I put it into 10 dimension embedding layer, which gives me 2000x100x10 tensor. so it's 2000 examples and each example has a 100x10 matrix. and then, I pass it to a conv1d and KMaxpolling to get 2000x24 matrix, which is 2000 examples and each example has a 24 dimension vector. and now, I would like to recombine those examples before I apply another layer. I would like to combine the first 10 examples together, and such and such, so I get a tuple. and then I pass that tuple to the next layer. My question is, Can I do that with Keras? and any idea on how to do it?

Solution

The idea of using "samples" is that these samples should be unique and not relate to each other.

This is something Keras will demand from your model: if it started with 2000 samples, it must end with 2000 samples. Ideally, these samples do not talk to each other, but you can use custom layers to hack this, but only in the middle. You will need to end with 2000 samples anyway.

I believe you're going to end your model with 200 groups, so maybe you should already start with shape (200,10,100) and use TimeDistributed wrappers:

inputs = Input((10,100))                       #shape (200,10,100)
out = TimeDistributed(Embedding(....))(inputs) #shape (200,10,100,10)
out = TimeDistributed(Conv1D(...))(out)        #shape (200,10,len,filters)

#here, you use your layer that will work on the groups without TimeDistributed.

To reshape a tensor without changing the batch size, use the Reshape(newShape) layer, where newShape does not include the first dimension (batch size).

To reshape a tensor including the batch size, use a Lambda(lambda x: K.reshape(x,newShape)) layer, where newShape includes the first dimension (batch size) - Here you must remember the warning above: somewhere you will need to undo this change so you end up with the same batch size as the input.