Search code examples
objective-cmacosrealmosx-elcapitan

How can a RLMRealm database grow from 25Mb to 11Gb in Objective C?


I've using Realm.io in one of my projects, well I've used it in a few iOS projects but this is my first Objective C desktop app that I've used it in so this question is based on OS X usage. Before going any further I think its also worth mentioning the test machine is running El Capitan so I'm aware this is beta software.

Realm is loaded as a CocoaPod, everything works fine and I'm able to run queries etc, no problems initiating or anything but something makes me think I'm not closing my calls correctly maybe?

My objects are 2 main different types, one to hold a 'group' and the other to hold an actual 'object'. The app is a photo uploader so it reads Apple's Photo app, indexes all the media objects and groups then uploads them.

On the first run or if i delete the realm completely so we are starting from scratch, everything flies through and goes really quickly. On the next run, my queries seem to run slower and the database first went from 25Mb to 50Mb then after an hour when i checked again i was at around 11Gb.

My main use of Realm is in a singleton but I'm executing a background queue which is queuing a new job for every object so when it discovers a photo it queues another job to check if it exists in the database and if not creates it, if it does it updates any existing information. As its doing this unless i declare Realm inside the job i get thread errors so its defined each time.

Below is an example of one of my calls, can anyone suggest some things i might be doing wrong or is there anyway to control or compact the size of the database now its so big?

- (void)saveMediaObject:(MLMediaObject *)mediaObject mediaGroup:(MLMediaGroup *)mediaGroup
{
    [jobQueue addOperationWithBlock:
     ^{
         NSLog(@"saveMediaObject");

         lastScanResult = [NSDate date];

         RLMRealm *realm = [RLMRealm defaultRealm];

         SPMediaObject *spMediaObject = [SPMediaObject objectInRealm:realm forPrimaryKey:[mediaObject identifier]];

         if(spMediaObject == nil)
         {
             spMediaObject = [[SPMediaObject alloc] init];
             spMediaObject.identifier = [mediaObject identifier];
         }

         [realm beginWriteTransaction];

         spMediaObject.lastSeen = [NSDate date];
         spMediaObject.versionURL = [[mediaObject URL] path];
         spMediaObject.versionMimeType = [self mimeTypeForExtension:[[spMediaObject.versionURL pathExtension] lowercaseString]];
         spMediaObject.originalURL = [[mediaObject originalURL] path];
         spMediaObject.originalMimeType = [self mimeTypeForExtension:[[spMediaObject.originalURL pathExtension] lowercaseString]];

         if([mediaObject name] != nil)
         {
             spMediaObject.caption = [mediaObject name];
         }
         else
         {
             spMediaObject.caption = @"";
         }

         [realm addOrUpdateObject:spMediaObject];

         [realm commitWriteTransaction];
     }];
}

Steve.


Solution

  • Realm's docs on file size should provide some insight as to what's happening here and how to mitigate the problem:

    You should expect a Realm database to take less space on disk than an equivalent SQLite database. If your Realm file is much larger than you expect, it may be because you have a RLMRealm that is referring to an older version of the data in the database.

    In order to give you a consistent view of your data, Realm only updates the active version accessed at the start of a run loop iteration. This means that if you read some data from the Realm and then block the thread on a long-running operation while writing to the Realm on other threads, the version is never updated and Realm has to hold on to intermediate versions of the data which you may not actually need, resulting in the file size growing with each write. The extra space will eventually be reused by future writes, or may be compacted — for example by calling writeCopyToPath:error:.

    To avoid this issue you, may call invalidate to tell Realm that you no longer need any of the objects that you've read from the Realm so far, which frees us from tracking intermediate versions of those objects. The Realm will update to the latest version the next time it is accessed.

    You may also see this problem when accessing Realm using Grand Central Dispatch. This can happen when a Realm ends up in a dispatch queue's autorelease pool as those pools may not be drained for some time after executing your code. The intermediate versions of data in the Realm file cannot be reused until the RLMRealm object is deallocated. To avoid this issue, you should use an explicit autorelease pool when accessing a Realm from a dispatch queue.

    If those suggestions don't help, the Realm engineering team would be happy to profile your project to determine ways to minimize file size growth. You can email code privately to help@realm.io.