Search code examples
tensorflowdeep-learningneural-networkpytorchword-embedding

Why does 'dimension' mean several different things in the machine-learning world?


I've noticed that AI community refers to various tensors as 512-d, meaning 512 dimensional tensor, where the term 'dimension' seems to mean 512 different float values in the representation for a single datapoint. e.g. in 512-d word-embeddings means 512 length vector of floats used to represent 1 english-word e.g. https://medium.com/@jonathan_hui/nlp-word-embedding-glove-5e7f523999f6

But it isn't 512 different dimensions, it's only 1 dimensional vector? Why is the term dimension used in such a different manner than usual?

When we use the term conv1d or conv2d which are convolutions over 1-dimension and 2-dimensions, a dimension is used in the typical way it's used in math/sciences but in the word-embedding context, a 1-d vector is said to be a 512-d vector, or am I missing something?

Why is this overloaded use of the term dimension? What context determines what dimension means in machine-learning as the term seems overloaded?


Solution

  • In the context of word embeddings in neural networks, dimensionality reduction, and many other machine learning areas, it is indeed correct to call the vector (which is typically, an 1D array or tensor) as n-dimensional where n is usually greater than 2. This is because we usually work in the Euclidean space where a (data) point in a certain dimensional (Euclidean) space is represented as an n-tuple of real numbers (i.e. real n-space ℝn).

    Below is an exampleref of a (data) point in a 3D (Euclidean) space. To represent any point in this space, say d1, we need a tuple of three real numbers (x1, y1, z1).

    a point in 3D (Euclidean) space

    Now, your confusion arises why this point d1 is called as 3 dimensional instead of 1 dimensional array. The reason is because it lies or lives in this 3D space. The same argument can be extended to all points in any n-dimensional real space, as it is done in the case of embeddings with 300d, 512d, 1024d vector etc.

    However, in all nD array compute frameworks such as NumPy, PyTorch, TensorFlow etc, these are still 1D arrays because the length of the above said vectors can be represented using a single number.

    But, what if you have more than 1 data point? Then, you have to stack them in some (unique) way. And this is where the need for a second dimension arises. So, let's say you stack 4 of these 512d vectors vertically, then you'd end up with a 2D array/tensor of shape (4, 512). Note that here we call the array as 2D because two integer numbers are required to represent the extent/length along each axis.

    To understand this better, please refer my other answer on axis parameter visualization for nD arrays, the visual representation of which I will include it below.

    axis parameter visualization


    ref: Euclidean space wiki