Search code examples
algorithmvectorcompressiondistributedcanonical-link

Is possible to denote a vector of numbers uniquely as a number?


Given a vector of numbers: V=(v1, v2, ..., vn). These n numbers don't need to be distinct or sorted.

Suppose that we have a few vectors V1, V2, ..., Vm. Is possible to use a number (integer or float number) to uniquely denote each vector, such that for any Vi not equal Vj, the corresponding numbers f(Vi) and f(Vj) are not equal either.

A simple solution is to have one number in the range from 0 to m-1 as an ID to represent a vector, however we assume that this kind of solution cannot work in the case that each vector is stored in a few distributed machines. That is, the portions of vectors in two machines might overlap, and the algorithm doesn't know the distribution of vectors globally.


Solution

  • I'm assuming the inputs are in principle unbounded and so is the output number, as it's trivial otherwise. A simple way is just concatenations the representations of n and v1, v2, .. vn in some base b. Represent them in k-bit digits, then annotate each k-bit digit with a continuation bit (0 if the next k-bit group starts a new number, 1 if it belongs to the same number). This isn't of much use for anything except equality tests, but you did not mention anything else.

    If you also care about preserving locality (i.e. nearby points p, q frequently have nearby values f(p), f(q)), some space-filling curves can be used for this purpose. The Hilbert curve is a bit complicated to generalize to higher dimensions, and the calculation is nontrivial. The Z-order curve isn't as good at preserving locality, but it's almost trivial to implement for any number of dimensions -- just interleave the bits of the binary representation.