I want to create a hash of a file which size minimum 5Mb and can extend to 1-2 Gb. Now tough choice arise in between these two methods although they work exactly same.
Method 1: sha1_file($file)
Method 2: sha1(file_get_contents($file))
I have tried with 10 Mb but there is no much difference in performance. But on higher data scale. What's better way to go?
Use the most high-level form offered unless there is a compelling reason otherwise.
In this case, the correct choice is sha1_file
. Because sha1_file
is a higher-level function that only works with files. This 'restriction' allows it to take advantage of the fact that the file/source can be processed as a stream1: only a small part of the file is ever read into memory at a time.
The second approach guarantees that 5MB-2GB of memory (the size of the file) is wasted/used as file_get_contents
reads everything into memory before the hash is generated. As the size of the files increase and/or system resources become limited this can have a very detrimental effect on performance.
1 The source for sha1_file
can be found on github. Here is an extract showing only lines relevant to stream processing:
PHP_FUNCTION(sha1_file)
{
stream = php_stream_open_wrapper(arg, "rb", REPORT_ERRORS, NULL);
PHP_SHA1Init(&context);
while ((n = php_stream_read(stream, buf, sizeof(buf))) > 0) {
PHP_SHA1Update(&context, buf, n);
}
PHP_SHA1Final(digest, &context);
php_stream_close(stream);
}
By using higher-level functions, the onus of a suitable implementation is placed on the developers of the library. In this case it allowed the use of a scaling stream implementation.