Search code examples
c#rediscouchbase

How to create an index in Couchbase in order to enumerate though all the keys?


Context: We use Couchbase 6.6.6-10576-enterprise as a key value store (we store only binary data) and we are working on a migration to Redis (I won't detail the reasons why we are doing this migration). Our application is written in C#, and we use the CouchbaseNetClient NuGet version 2.7.27 to interact with Couchbase.

Goal: In the context of the migration, we want to write a simple tool that iterate though all the keys of our Couchbase instance and then write them (as long as the value they are associated to of course) in Redis.

Problem: In my comprehension, getting all the keys can be done with such query:

SELECT META().id FROM `my-bucket` LIMIT 1000

But when I run such query I get the folowing error message:

No index available on keyspace my-bucket that matches your query. Use CREATE INDEX or CREATE PRIMARY INDEX to create an index, or check that your expected index is online.

Unless I'm wrong the index can be created with this command:

CREATE PRIMARY INDEX `my-bucket-index` ON `my-bucket`

Question: what are the consequences of creating such index in production and is it the appropriate approach in order to solve our problem ?


Solution

  • I'm sorry to hear that you're moving away from Couchbase :( I would like to know more about why, if you eventually feel up to it.

    That being said:

    I would not normally recommend a PRIMARY INDEX in production on Couchbase, because it can act like a fallback index, enabling any query to run, acting like the equivalent of a full table scan (this is bad for performance on large pools of data).

    However, for your case, where you want to retrieve every key, I think it's probably okay as long as:

    1. You get rid of the primary index when you're done with it.
    2. The queries used by your applications (assuming you have some) are already properly indexed.
    3. You don't plan to deploy any new queries to production that aren't properly indexed that might use the primary index unintentionally.

    I'd still be concerned about performance whenever messing with indexes in production. It might be a safer approach to:

    A) Temporarily add an Analytics node in your cluster, and run your migration queries there.

    B) Spin up a second cluster, setup XDCR, and query the data on the second cluster.

    C) Setup a Kafka connector, and process documents.

    Those might be licensing events though, since you're on Enterprise 🤷‍♂️. IANAL, YMMV.