I am using the qdrant DB and client for embedding a document as part of a PoC that I am working on in building a RAG.
I see that when I use a Manhattan distance to build the vector collection I get a high score than when I use the Cosine distance. However, the text chunk returned is the same. I am not able to understand why and how? I am learning my ropes here at RAG still. Thanks in advance.
USER QUERY
What is DoS?
COSINE DISTANCE
response: [
ScoredPoint(id=0,
version=10,
score=0.17464592,
payload={
'chunk': "It also includes overhead bytes for operations,
administration, and maintenance (OAM) purposes.\nOptical Network Unit
(ONU)\nONU is a device used in Passive Optical Networks (PONs). It converts
optical signals transmitted via fiber optic cables into electrical signals that
can be used by end-user devices, such as computers and telephones. The ONU is
located at the end user's premises and serves as the interface between the optical
network and the user's local network."
},
vector=None, shard_key=None)
]
MANHATTAN DISTANCE
response: [
ScoredPoint(id=0,
version=10,
score=103.86209,
payload={
'chunk': "It also includes overhead bytes for operations, administration,
and maintenance (OAM) purposes.\nOptical Network Unit
(ONU)\nONU is a device used in Passive Optical Networks (PONs). It converts
optical signals transmitted via fiber optic cables into electrical signals that
can be used by end-user devices, such as computers and telephones. The ONU is
located at the end user's premises and serves as the interface between the optical
network and the user's local network."
},
vector=None, shard_key=None)
]
There are many different math functions that can be used to calculate similarity between two embedding vectors:
Each calculates similarity in a different way, where:
Note: Image source for all four images
Consequently, the results of similarity calculations are different, where:
See the table below.
Measure | Range | Interpretation |
---|---|---|
Cosine distance | [0, 2] | 0 if vectors are the same, 2 if they are diametrically opposite. |
Manhattan distance | [0, ∞) | 0 if vectors are the same, increases with the sum of absolute differences. |
Euclidean distance | [0, ∞) | 0 if vectors are the same, increases with the sum of squared differences. |
Dot product | (-∞, ∞) | Measures alignment, can be positive, negative, or zero based on vector direction. |