Search code examples
iosswifthashcommoncrypto

How can I hash a file on iOS using swift 3?


I have a number of files that will live on a server. Users have the ability to create these kinds of files (plists) on-device which will then upload to said server (CloudKit). I would like to unique them by content (the uniquing methodology should be resilient to variations in creation date). My understanding is that I should hash these files in order to obtain unique file names for them. My questions are:

  1. Is my understanding correct that what I want is a hash function?
  2. Which function should I use (from CommonCrypto).
  3. What I need is a digest?
  4. How would I go about it in code? (I assume this should be hashed over an NSData instance?). My understanding from googling around is that I need a bridging header include but beyond that the use of CommonCrypto baffles me. If there is a simpler way using first-party APIs (Apple) I am all ears (I want to avoid using third party code as much as possible).

Thanks so much!


Solution

  • Create a cryptographic hash of each file and you can use that for uniqueness comparisons. SHA-256 is a current hash function and on iOS with Common Crypto is quite fast, on an iPhone 6S SHA256 will process about 1GB/second minus the I/O time. If you need fewer bytes just truncate the hash.

    An example using Common Crypto (Swift3)

    For hashing a string:

    func sha256(string: String) -> Data {
        let messageData = string.data(using:String.Encoding.utf8)!
        var digestData = Data(count: Int(CC_SHA256_DIGEST_LENGTH))
    
        _ = digestData.withUnsafeMutableBytes {digestBytes in
            messageData.withUnsafeBytes {messageBytes in
                CC_SHA256(messageBytes, CC_LONG(messageData.count), digestBytes)
            }
        }
        return digestData
    }
    let testString = "testString"
    let testHash = sha256(string:testString)
    print("testHash: \(testHash.map { String(format: "%02hhx", $0) }.joined())")
    
    let testHashBase64 = testHash.base64EncodedString()
    print("testHashBase64: \(testHashBase64)")
    

    Output:
    testHash: 4acf0b39d9c4766709a3689f553ac01ab550545ffa4544dfc0b2cea82fba02a3
    testHashBase64: Ss8LOdnEdmcJo2ifVTrAGrVQVF/6RUTfwLLOqC+6AqM=

    Note: Add to your Bridging Header:

    #import <CommonCrypto/CommonCrypto.h>
    

    For hashing data:

    func sha256(data: Data) -> Data {
        var digestData = Data(count: Int(CC_SHA256_DIGEST_LENGTH))
    
        _ = digestData.withUnsafeMutableBytes {digestBytes in
            data.withUnsafeBytes {messageBytes in
                CC_SHA256(messageBytes, CC_LONG(data.count), digestBytes)
            }
        }
        return digestData
    }
    
    let testData: Data = "testString".data(using: .utf8)!
    print("testData: \(testData.map { String(format: "%02hhx", $0) }.joined())")
    let testHash = sha256(data:testData)
    print("testHash: \(testHash.map { String(format: "%02hhx", $0) }.joined())")
    

    Output:
    testData: 74657374537472696e67
    testHash: 4acf0b39d9c4766709a3689f553ac01ab550545ffa4544dfc0b2cea82fba02a3

    Also see Martin's link.