SOLVED - I used a combination of manual management (bypassing the garbage collector) and the mapped NSData option. iStat, it turns out, didn't have the right memory figure and Instruments indicated the expected behavior. Additionally, the CC_MD5() and CC_SHA1() calls do indeed call into CC_MD5_Update() and CC_SHA1_Update() already, so they're not causing problems either.
I'm currently working on an Cocoa application that needs to hash massive files using SHA-1 and MD5. I'm using CC_MD5 and CC_SHA1 and reading in the file to an NSData object. However, this uses massive amounts of RAM and leaks memory like a sieve for some reason even though the NSData object is not referenced… I suspect it's the garbage collector struggling to keep up.
What's the best (easiest if possible as well, but I'm not averse to doing some extra work to speed things up) way to perform MD5 and SHA-1 hashes on massive files like this?
Follow-Up
As mentioned below, mapped NSData might help, but I think I found another option. It still needs some work but seems like a much more robust solution. The idea is to use an NSFileHandle and read "chunks" - so maybe a maximum of 256MB at a time. Then (for MD5 for example) use CC_MD5() followed by a series of CC_MD5_Update() to compute the hash in chunks. Combining that with manual memory management should help.
Are you using Memory Mapped Files? That way you don't have to read the entire file into memory and the OS will take care of caching whats needed:
[NSData dataWithContentsOfFile:@"filename.dat"
options:NSDataReadingMappedIfSafe
error:&error];
(you can also use NSDataReadingMappedAlways to force memory mapping)