Search code examples
marklogicdata-synchronization

MarkLogic Database Synchronization


We are looking for a solution to satisfy following requirements:

  1. Our client has one ML database and they are not giving us any access, even we cannot run read only query there
  2. They want to create another ML database instance of that database with synchronization without any delay.
  3. And whatever action we want to perform like query or update, will perform on this replica database.

There are few guidelines are available to perform these actions:

  1. Database Backup/restore (With or without journal Archiving) http://docs.marklogic.com/guide/admin/backup_restore
  2. Flexible Replication (Uses Content Processing Framework) http://docs.marklogic.com/guide/flexrep/rep_intro
  3. Databse Replication http://docs.marklogic.com/guide/database-replication/dbrep_intro
  4. Forest Backup/Restore (instead of entire database just forest) http://docs.marklogic.com/guide/admin/forests#id_76303
  5. XQSync (application-level synchronization tool) http://marklogic.github.io/xqsync/tutorial.html

But, we are unable to finalize it, which one is suitable in our case. Please suggest us which solution is best.


Solution

  • It sounds like you're setting up a development server and looking to copy data from production. You should consider

    • the impact on the production server
    • whether you need frequent updates from the production server (does the production app update its database? do you need those in development?)
    • do you need all the data, or a subset?

    There's a good chance that the database administrators have scheduled regular backups, especially if there are frequent updates. Assuming that's the case, you should be able to get a copy of one of those backups without any impact on the prod server. So my first suggestion would be #1. You could update your dev servers with these backups as often as they are made by the admins. Note that using a database backup requires both servers to be running on the same platform. This will get you all the production data.

    On the other hand, if it's important that your dev server get frequent updates from the production server more or less as they happen, then #2 Flexible Replication or #3 Database Replication will get the job done. Understand that these require more work from the production server and probably not something you want to do for dev purposes, but it's an option. The impact on the prod server will depend on the frequency of database updates in prod. Of the two, I don't know which would have more impact on prod (anyone else?). This could get you all data, or could be configured to get a subset.

    Finally, if you don't need frequent (as-they-happen) updates and the admins are not doing backups, or if prod and dev are running on different platforms (Windows/Linux/Mac), you could run XQSync during off hours. I'd do this really carefully, as mixing up the parameters could send data the wrong direction. You can configure this to take a subset of data based on directories, collections, or a custom query.

    A related note: each of these strategies is about copying data, but none of them copy the database configuration for you. That will require some other approach. You might use the Roxy Deployer to keep your config in git/svn/whatever and bootstrap your configuration the same way in all environments. You might use the MarkLogic-provided Configuration Manager to take a snapshot of the production environment (you'll need the admins to do this for you) and import that into your dev environment.

    Suggested starting point: make sure your configuration matches, then use the backups that the admins are probably taking anyway. This will have the least impact on and lowest risk to the production system. If that doesn't meet your needs, then look at one of the other strategies.