Search code examples
phphashsha

Which is preferable: sha1_file(f) or sha1(file_get_contents(f))?


I want to create a hash of a file which size minimum 5Mb and can extend to 1-2 Gb. Now tough choice arise in between these two methods although they work exactly same.

Method 1: sha1_file($file)
Method 2: sha1(file_get_contents($file))

I have tried with 10 Mb but there is no much difference in performance. But on higher data scale. What's better way to go?


Solution

  • Use the most high-level form offered unless there is a compelling reason otherwise.

    In this case, the correct choice is sha1_file. Because sha1_file is a higher-level function that only works with files. This 'restriction' allows it to take advantage of the fact that the file/source can be processed as a stream1: only a small part of the file is ever read into memory at a time.

    The second approach guarantees that 5MB-2GB of memory (the size of the file) is wasted/used as file_get_contents reads everything into memory before the hash is generated. As the size of the files increase and/or system resources become limited this can have a very detrimental effect on performance.


    1 The source for sha1_file can be found on github. Here is an extract showing only lines relevant to stream processing:

    PHP_FUNCTION(sha1_file)
    {       
        stream = php_stream_open_wrapper(arg, "rb", REPORT_ERRORS, NULL);
        PHP_SHA1Init(&context);    
        while ((n = php_stream_read(stream, buf, sizeof(buf))) > 0) {
            PHP_SHA1Update(&context, buf, n);
        }    
        PHP_SHA1Final(digest, &context);    
        php_stream_close(stream);
    }
    

    By using higher-level functions, the onus of a suitable implementation is placed on the developers of the library. In this case it allowed the use of a scaling stream implementation.