I'm currently writing a Bash script which hashes each line of a text file and outputs it into a new file with the format hash:orginalword
. The script I have at the moment to do this is:
cat $originalfile | while read -r line; do
hash="$(printf %s "$line" | $hashfunction | cut -f1 -d' ')"
echo "$hash:$line" >> $outputlocation
done
I originally got the code for this from a very similar question linked here. The script works exactly as advertised; however, the problem is that even for extremely small text files (<15KB) it takes a very long time to process.
I would really appreciate it if someone could suggest a script which achieves exactly the same outcome but does so far more efficiently.
Thank you in advance for any help,
Kind regards, John
I'd be very wary of doing this in pure shell. The overhead of starting up the hashing function for every line is going to make it really slow on a large file.
How about a short bit of Perl?
perl -MDigest::MD5 -nle 'print Digest::MD5::md5_hex($_), ":", $_' <$originalfile >>$outputlocation
Perl has a variety of Digest
modules, so it is easy to use something less broken than MD5.
perl -MDigest::SHA -nle 'print Digest::SHA::sha256_hex($_), ":", $_' <$originalfile >>$outputlocation
If you want to use Whirlpool, you can install it from CPAN with
cpan install Digest::Whirlpool
and use it with
perl -MDigest -nle '$ctx = Digest->new("Whirlpool"); $ctx->add($_); print $ctx->hexdigest(), ":", $_' <$originalfile >>$outputlocation