I've noticed that AI community refers to various tensors as 512-d, meaning 512 dimensional tensor, where the term 'dimension' seems to mean 512 different float values in the representation for a single datapoint. e.g. in 512-d word-embeddings means 512 length vector of floats used to represent 1 english-word e.g. https://medium.com/@jonathan_hui/nlp-word-embedding-glove-5e7f523999f6
But it isn't 512 different dimensions, it's only 1 dimensional vector? Why is the term dimension
used in such a different manner than usual?
When we use the term conv1d
or conv2d
which are convolutions over 1-dimension and 2-dimensions, a dimension is used in the typical way it's used in math/sciences but in the word-embedding context, a 1-d vector is said to be a 512-d vector, or am I missing something?
Why is this overloaded use of the term dimension
? What context determines what dimension
means in machine-learning as the term seems overloaded?
In the context of word embeddings in neural networks, dimensionality reduction, and many other machine learning areas, it is indeed correct to call the vector (which is typically, an 1D array or tensor) as n-dimensional where n
is usually greater than 2. This is because we usually work in the Euclidean space where a (data) point in a certain dimensional (Euclidean) space is represented as an n-tuple of real numbers (i.e. real n-space ℝn).
Below is an exampleref of a (data) point in a 3D (Euclidean) space. To represent any point in this space, say d
1, we need a tuple of three real numbers (x
1, y
1, z
1).
Now, your confusion arises why this point d
1 is called as 3 dimensional instead of 1 dimensional array. The reason is because it lies or lives in this 3D space. The same argument can be extended to all points in any n-dimensional real space, as it is done in the case of embeddings with 300d
, 512d
, 1024d
vector etc.
However, in all nD array compute frameworks such as NumPy, PyTorch, TensorFlow etc, these are still 1D arrays because the length of the above said vectors can be represented using a single number.
But, what if you have more than 1 data point? Then, you have to stack them in some (unique) way. And this is where the need for a second dimension arises. So, let's say you stack 4 of these 512d
vectors vertically, then you'd end up with a 2D array/tensor of shape (4, 512)
. Note that here we call the array as 2D because two integer numbers are required to represent the extent/length along each axis.
To understand this better, please refer my other answer on axis parameter visualization for nD arrays, the visual representation of which I will include it below.
ref: Euclidean space wiki