machine-learning deep-learning pytorch lstm

Pytorch: Why batch is the second dimension in the default LSTM?

In the PyTorch LSTM documentation it is written:

batch_first – If True, then the input and output tensors are provided as (batch, seq, feature). Default: False

I'm wondering why they chose the default batch dimension as the second one and not the first one. for me, it is easier to imaging my data as [batch, seq, feature] than [seq, batch, feature]. The first one is more intuitive for me and the second one is counterintuitive.

I'm asking here to know if the is any reason behind this and if you can help me to have some understanding about it.

Solution

As far as I know, there is not a heavily justified answer. Nowadays, it is different from other frameworks where, as you say, the shape is more intuitive, such as Keras, but only for compatibility reasons with older versions, changing a default parameter that modifies the dimensions of a vector would probably break half of the models out there if their maintainers update to newer PyTorch versions.

Probably the idea, in the beginning, was to set the temporal dimension first to simplify the iterating process over time, so you can just do a

for t, out_t in enumerate(my_tensor)

instead of having to do less visual stuff such as accessing with my_tensor[:, i] and having to iterate in range(time).