Search code examples
cloudantcloudant-sdp

how to increase the sample size used during schema discovery to 'unlimited'?


I have encountered some errors with the SDP where one of the potential fixes is to increase the sample size used during schema discovery to 'unlimited'.

For more information on these errors, see:

Question:

How can I set the sample size? After I have set the sample size, do I need to trigger a rescan?


Solution

  • These are the steps you can follow to change the sample size. Beware that a larger sample size will increase the runtime for the algorithm and there is no indication in the dashboard other than the job remaining in 'triggered' state for a while.

    1. Verify the specific load has been stopped and the dashboard status shows it as stopped (with or without error)

    2. Find a document https://<account>.cloudant.com/_warehouser/<source> where <source> matches the name of the Cloudant database you have issues with

      Note: Check https://<account>.cloudant.com/_warehouser/_all_docs if the document id is not obvious

    3. Substitute "sample_size": null (which scans a sample of 10,000 random documents) with "sample_size": -1 (to scan all documents in your database) or "sample_size": X (to scan X documents in your database where X is a positive integer)

    Save the document and trigger a rescan in the dashboard. A new schema discovery run will execute using the defined sample size.