Search code examples
windowsreplicationfile-management

Copy 13+ Million Tiny Files to New Server


Situation:

To replace an 10+ year old Windows 2000 2-Node cluster with shared MSA SCSI storage with a newer Windows 2003 2-Node cluster with shared FC storage.

The shared storage is current split into two drives X(data) and Q(quorum).

The X Drive consists of a Flat File DB consisting of 13.1 million+ files in 1.3 million+ folders. These files need to be copied from the old cluster to the new cluster with minimal down time.

  • File Count: 13,023,328
  • Total Filesize: 8.43 GB (File Size not Size on Disk)
  • Folder Count: 1308153

The old Win 200 Cluster has been up for over 10 years, continually reading/writing and is now also heavily fragmented. The X Drive on the Win 2000 Cluster also contains 7 backups of the DB, which are created/updated via Robo Copy once per day, this currently takes 4-5 hours and adds a real lag to system performance.

Old Cluster - 2 x HP DL380 G4 | 1 x HP MSA 500 G2 (SCSI) | Raid 5 (4 disks + Spare)| Win 2k

New Cluster - 2 x HP DL380 G7 | 1 x HP StorageWorks P2000 G2 MSA (Fibre Channel) | Win 2k3

The Database can be offline for 5 to 8 hours comfortably, and 15 hours absolute maximum, due to the time sensitive data it provides.

Options We've Tried:

  1. Robo / FastCopy both seemed to sit around 100-300 files copied per second, with the database offline.
  2. Peersync Copy from a local node backup (D: drive), this completed in 17 hours with an average of 250 files per second.

Question/Options:

  1. Block by Block Copy - We think might be the fastest, but it will also copy the backups from the original X drive.
  2. Redirect Daily Backup - Redirect the daily backup from the local X Drive to a network share of the new X Drive. Slow to begin with, but will then only be up to 12 hours out of date when we come to switch over, as it could be run while the old system is live. Final Sync on the move day, should take no more than 10 hours, to 100% confirm the old and new systems are identical.
  3. Custom Copy Script - We have access to C# and Python
  4. Robo/Fast Copy/ Other File Copy, open to suggestions and settings
  5. Disk Replace / Raid Rebuild - The risky or impossible option, replace each of the older disks, with a new smaller form factor disk, in old G2 caddy, allow raid to rebuild, replace and rebuild until all drives are replaced. On day of migration, move the 4 disks to new P2000 MSA, in the same raid order?
  6. Give Up - And leave it running on the old hardware until it dies a fiery death.*

We seem to be gravitating to Option 2, but thought we should put this to some of the best minds in the world before committing.

ps. Backups on the new cluster are to a new (M) drive using Shadow Copy. * Unfortunately not a real option, as we do need to move to the newer hardware as the old storage and clustercan no longer cope with demand.


Solution

  • We went with Option 2, and redirected the twice daily backup from the original cluster to the new MSA raid on the new cluster.

    It was run as a pull from the new cluster using PeerSync and a Windows share on the old cluster.

    We tried to use the PeerSync TCP client which would have been faster / more efficient, but it wasn't compatible with Windows 2000. PeerSync was chosen over most other copy tools out there due to its compatibility and non-locking file operations, allowing the original cluster to be online throughout with minimal performance impact.

    This took around 13.5 hours for the initial copy, and then around 5.5 hours for the incremental diff copies. The major limiting factor was the original clusters shared MSA RaidSet, the drives were online and being access through the backups, so the normal operation slowed down the backup times.

    The final sync took about 5 hours and that was the total time the database was offline, for the hardware upgrade.