Search code examples
node.jsopenai-apiredisearchlarge-language-modelredis-stack-server

KNN Vector similarity search in Redis, is not returning any results


I am trying to use Redis to store the embedding vectors returned from the openAi API, then perform a similarity search to retrieve similar results, in NodeJs. For test purposes, I have 10 keys in Redis at the moment, but the query never returns a record. It always returns an empty document list:

{ total: 0, documents: [] }

Schema Declaration:

const schema: RediSearchSchema = {
      '$.text': {
        type: SchemaFieldTypes.TEXT,
        AS: 'text',
      },
      '$.embedding': {
        type: SchemaFieldTypes.VECTOR,
        ALGORITHM: VectorAlgorithms.HNSW,
        TYPE: 'FLOAT32',
        DIM: 1536,
        DISTANCE_METRIC: 'COSINE',
        AS: 'embedding',
      },
    };
    
RedisClient.registerIndex({
      schema: schema,
      name: 'contexts',
      prefix: KNOWLEGE_KEYS_PREFIX,
    });

Index creation:

private static async createIndices() {
    RedisClient.indices.forEach(async (i) => {
      try {
        await RedisClient.client.ft.CREATE(i.name, i.schema, {
          ON: 'HASH',
          PREFIX: i.prefix,
        });
      } catch (err) {
        const message = `index ${i.name} already exists`;
        Logger.logError(message);
      }
    });
  }

static registerIndex(ri: RedisIndex) {
    RedisClient.indices.push(ri);
  }

Vector addition:

 RedisClient.client.HSET(key, {
          text: e.text,
          embedding: Buffer.from(new Float32Array(e.vector).buffer),
        });

Code for performing vector search:

static async search(indexName: string, queryVector: Buffer, vectorFieldName = 'embedding', top = 5): Promise<any> {
    try {
      const query = `*=>[KNN ${top} @${vectorFieldName} $queryVector AS vec_score]`;
      console.log(query);
      const result = await RedisClient.client.ft.search(indexName, query, {
        PARAMS: {
          queryVector: queryVector,
        },
        DIALECT: 2,
        RETURN: ['text', 'vec_score'],
        SORTBY: 'vec_score',
        LIMIT: {
          from: 0,
          size: top,
        },
      });
      console.log(result);
      return result;
    } catch (err) {
      console.log(err);
      Logger.logError(err);
    }
  }

These snippets of code are present in different files, but all are getting called with proper values. I have tried searching vector for the exact text field stored in one of the keys in Redis. Still, it does not return any results. Any help is much appreciated.


Solution

  • It seems like you mix JSON and HASH annotations. Can you try running an HGET command on one of the docs to verify its structure, and include an FT.INFO output to verify the index parameters?

    the ”$.text” as “text” and ”$.embedding” as “embedding” suggest that you have a JSON path that leads to the two fields, and you make an alias name for referring to them in queries. But yet the index expect to find the data to index under the path you initially provided, and since you don’t have the data under $.text and $.embedding, it cannot find the data and the index remains empty.

    Try replacing

      '$.text': {
        type: SchemaFieldTypes.TEXT,
        AS: 'text',
      },
      '$.embedding': {
        type: SchemaFieldTypes.VECTOR,
        ALGORITHM: VectorAlgorithms.HNSW,
        TYPE: 'FLOAT32',
        DIM: 1536,
        DISTANCE_METRIC: 'COSINE',
        AS: 'embedding',
      },
    

    With

      'text': {
        type: SchemaFieldTypes.TEXT,
      },
      'embedding': {
        type: SchemaFieldTypes.VECTOR,
        ALGORITHM: VectorAlgorithms.HNSW,
        TYPE: 'FLOAT32',
        DIM: 1536,
        DISTANCE_METRIC: 'COSINE',
      },
    

    If that’s not the problem, I could assist better if you’ll provide the additional data I mentioned