Search code examples
pythonpostgresqlsqlalchemyembeddingpgvector

Retrieve similarity score in Python for HNSW vector indexing in PostgreSQL


I'm having some trouble returning the score of a vector search on PostgreSQL. I have created a HNSW index on the columns answer_embedding and answer_keyword_embedding.

In Python, I've written this function to return the 5 most common documents based on a QA answer embedding. It works:

similarity_score = (
    (Document.answer_embedding.op('<->')(embedding_array)) +
    (Document.answer_keyword_embedding.op('<->')(keyword_embedding_array))
)

query = (
    select(Document.doc_id, Document.answer))
    .order_by(similarity_score)
    .limit(max_neighbors)
)
results = session.execute(query).all()

However, I also want to return the score of how relevant each document is. Whatever I do I see to get the following error message:

Session failed: 'float' object is not subscriptable

The following code snippets all give me the same error:

Session failed: 'float' object is not subscriptable

query = (
    select(Document.doc_id, Document.answer,similarity_score.label("similarity"))
    .order_by(similarity_score)
    .limit(max_neighbors)
)
query = (
    select(Document).add_columns(similarity_score.label("similarity"))
    .order_by(similarity_score)
    .limit(max_neighbors)
)

If I convert it to using a full query text instead, I get the error that the operator "<->"/"<=>" is not recognized.

When I print the query it looks like this.

Query: SELECT "Document"."DocId", "Document"."Answer", ("Document"."AnswerEmbedding" <-> :AnswerEmbedding_1) + ("Document"."AnswerKeywordEmbedding" <-> :AnswerKeywordEmbedding_1) AS similarity
FROM "Document" ORDER BY ("Document"."AnswerEmbedding" <-> :AnswerEmbedding_1) + ("Document"."AnswerKeywordEmbedding" <-> :AnswerKeywordEmbedding_1)
 LIMIT :param_1

Since I am able to retrieve the documents, I'm having a hard time understanding why I can't get the score for each document.

Providing traceback below:

File "path/to/database.py", line 142, in retrieve_documents
    results = session.execute(query).scalars().all()
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "path/to/sqlalchemy/engine/result.py", line 1769, in all
    return self._allrows()
           ^^^^^^^^^^^^^^^
  File "path/to/sqlalchemy/engine/result.py", line 548, in _allrows
    rows = self._fetchall_impl()
           ^^^^^^^^^^^^^^^^^^^^^
  File "path/to/sqlalchemy/engine/result.py", line 1676, in _fetchall_impl
    return self._real_result._fetchall_impl()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "path/to/sqlalchemy/engine/result.py", line 2270, in _fetchall_impl
    return list(self.iterator)
           ^^^^^^^^^^^^^^^^^^^
  File "path/to/sqlalchemy/orm/loading.py", line 219, in chunks
    fetch = cursor._raw_all_rows()
            ^^^^^^^^^^^^^^^^^^^^^^
  File "path/to/sqlalchemy/engine/result.py", line 541, in _raw_all_rows
    return [make_row(row) for row in rows]
            ^^^^^^^^^^^^^
  File "path/to/sqlalchemy/cyextension/resultproxy.pyx", line 22, in sqlalchemy.cyextension.resultproxy.BaseRow.__init__
  File "path/to/sqlalchemy/cyextension/resultproxy.pyx", line 79, in sqlalchemy.cyextension.resultproxy._apply_processors
  File "path/to/pgvector/sqlalchemy/vector.py", line 33, in process
    return Vector._from_db(value)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "path/to/pgvector/utils/vector.py", line 71, in _from_db
    return cls.from_text(value).to_numpy().astype(np.float32)
           ^^^^^^^^^^^^^^^^^^^^
  File "path/to/pgvector/utils/vector.py", line 36, in from_text
    return cls([float(v) for v in value[1:-1].split(',')])
                                  ~~~~~^^^^^^
TypeError: 'float' object is not subscriptable

Solution

  • I solved it by changing the similarity score to this:

        similarity_score = (
            (Document.answer_embedding.cosine_distance(embedding)) +
            (Document.answer_keyword_embedding.cosine_distance(keyword_embedding))
        )