Search code examples
pythonmachine-learningkerasdata-sciencepytorch

Is Seq2Seq the right model for my data? Any examples?


I'm trying to train a model to predict design patterns from web pages. I'm using coordinates of bounding rects given a bunch of element groupings. Patterns look like this:

 [[elementId, width, height, x, y]]

so my target would be the [[x,y]] given [[elementId, width, height]].

Concretely:

 [[5, 1.0, 1.0], [4, 1.0, 1.0], [2, 175.0, 65.0], [2, 1.0, 1.0], [4, 1.0, 1.0]]
 ->
 [[0.0, 0.0], [0.0, 10.0], [3.0, 0.0], [0.0, 68.0], [0.0, 10.0]]


 [[2, 14.0, 14.0], [2, 14.0, 14.0], [2, 14.0, 14.0]]  
 ->
 [[0.0, 3.0], [0.0, 3.0], [0.0, 3.0]]

Patterns vary in size so I've padded them with [[0,0,0]]. I currently have about 15k of them, but can get more.

I was told that seq2seq with attention is the right model for this job. I've started with https://machinelearningmastery.com/develop-encoder-decoder-model-sequence-sequence-prediction-keras/ and achieved horrendous results.

Every seq2seq example i can find (searching for keras or pytorch) is used for translation which is categorical and I'm struggling to find a good regression based example.

So my questions are:

  1. Is this the right model (encoder/decoder LSTM) for what i'm trying to do?

  2. Is there any examples if so?


Solution

  • Seq2Seq/LSTM are used when input/output are variable lengths.

    Your input is of size 3 and output is of size 2 (at least for the given examples). So you can use a simple one/two-hidden layer feed-forward model with the L2/L1 loss (for regression). Any opt (SGD/Adam) should be fine, however, Adam works well in practice.

    Also, I think you should not use coordinates as it is, you can scale it so that the highest coordinate is 1 and hence the input/output range would be between 0 and 1. An added advantage, this would help you to generalize to different screen sizes intuitively.