Search code examples
cloudantcloudant-sdp

What steps are performed by a 'Rescan'?


To automatically warehouse documents from Cloudant to dashDB, there is a schema discovery process (SDP) that automates the data migration for you. When using the SDP to warehouse documents from Cloudant to dashDB, there is an option 'Rescan'.

I have used 'Rescan' a number of times, but am unclear on the steps it actually performs. What steps are performed by a 'Rescan'? E.g.

  1. Drop tables in the dashDB target schema? Which tables?
  2. Scan Cloudant source database?
  3. Recreate the target schema?
  4. ...
  5. ...

Solution

  • The steps are pretty much as you suggested. Rescan will

    1. Inspect the previously discovered JSON schema and remove all tables from the dashDB instance created for that load (leaving any user defined tables untouched)

    2. Re-discover the JSON schema again using the current settings (including sample size, type of discovery algorithm etc.)

    3. Create the new tables into the same dashDB target

    4. Ingest the newly created tables with data from Cloudant

    5. Subscribe to the _changes feed from Cloudant to continuously synchronize document changes with dashDB

    All steps (except for the first) are identical for the initial load as well as the rescan function.

    The main motivation for a rescan is to support schema evolution. Whenever the document structure in a Cloudant source database changes, a user can make a conscious decision to drop and re-create the dashDB tables using this rescan function. SDP won't automate that process to avoid potential conflicts with applications depending on the existing dashDB tables.