Search code examples
mongodbarchiverestore

what is mongodb archive format?


I've backed up some mongoDBs using their archive option, but I can't simply untar them. When I go through some steps to decompress the data it looks like the archive is the whole DB in one big file. I wanted to get at the files for the individual collections. Is there a way to do that?

$ tar -xvf valk.archive
tar: Unrecognized archive format
tar: Error exit delayed from previous errors.
$ file valk.archive
valk.archive: gzip compressed data, original size 13953183
$ gunzip valk.archive
gunzip: valk.archive: unknown suffix -- ignored
$ unzip valk.archive
Archive:  valk.archive
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of valk.archive or
        valk.archive.zip, and cannot find valk.archive.ZIP, period.
$ mv valk.zip valk.gz
$ gunzip valk.gz
$ open .
$ tar -xvf valk
tar: Unrecognized archive format
tar: Error exit delayed from previous errors.
$ head valk
TemplateDatametadata�{"options":{},"indexes":[{"v":2,"key":{"_id":1},"name":"_id_","ns":"valk.TemplateData"}],"uuid":"f52402b5aba24856b072d57cc3e46a72"}size-dbvalkcollectioMetricsmetadata�{"options":{"capped":true,"size":10485760,"max":1000000},"indexes":[{"v":2,"key":{"_id":1},"name":"_id_","ns":"valk.Metrics"},{"v":2,"key":{"openid":1},"name":"openid_1","ns":"valk.Metrics"}],"uuid":"43d92ff01815432c95dac5a2e05a64c0"}size�dbvalkcollection
AppConfigmetadata�{"options":{},"indexes":[{"v":2,"key":{"_id":1},"name":"_id_","ns":"valk.AppConfig"}],"uuid":"df633b0a43184de38e8b8ea7489cda3e"}size�dbvalkcollecMinibotZonesmetadata�{"options":{},"indexes":[{"v":2,"key":{"_id":1},"name":"_id_","ns":"valk.MinibotZones"}],"uuid":"095bbac0d17640be9e27dffe681b7d83"}size�dbvalkcollection    ChatLogsmetadataQ{"options":{"capped":true,"size":104857600,"max":10000000},"indexes":[{"v":2,"key":{"_id":1},"name":"_id_","ns":"valk.ChatLogs"},{"v":2,"key":{"openid":1,"createdAt":1},"name":"openid_1_createdAt_1","ns":"valk.ChatLogs"},{"v":2,"key":{"createdAt":1},"name":"createdAt_1","ns":"valk.ChatLogs"}],"uuid":"70586c82b3ae42cf8d9c47ad339ea55b"}size�dbvalkcollection

Solution

  • The mongodump archive format is a special purpose format; you need to use mongorestore --archive with any other options that are appropriate.

    For example, you can use the --nsInclude option (mongorestore 3.4+) to selectively restore multiple collections by namespace.

    For more information on the MongoDB archive format (and why tar wasn't suitable), see: Archiving and Compression in MongoDB Tools. The gist of this is:

    General purpose archive formats, like tar, only support contiguous file packing within the archive. Using these archive formats for mongodump and mongorestore will create an unacceptable performance degradation as data from all collections will have to be written to and read from, in order. To support the concurrent behavior of these tools, we developed a special purpose archive format that supports non-contiguous files writes. The new archiving feature provides major gains in the efficiency of backup and restore operations.