Imagine a one way function that is able to map an input with infinite possibilities to a fixed length and “unique” output. Note that unique is quoted since theoretically is not possible to map an infinite set to a finite set without repeating outputs (collisions), however, the hash functions (also known as Message Digest) like SHA256, SHA384 or SHA512 are collision resistant with our current computing power. In other words, it is currently really hard to find any collisions with these hash functions through brute force with our computing power.
This property of hash functions it’s really important for digital signatures which makes use of this uniqueness trait, since each document signed needs to be proven its integrity (Document not tampered) and no two documents have the same hash value (collision-less).
Since hash functions are one way functions, there’s no reverse function to traceback the input from an output which means that it works differently than symmetric or asymmetric encryption, so there’s no such “decryption” concept on hash operations.
Hash functions are also useful for checksum or error control purposes so aided by their uniqueness trait, any data transferred between 2 parties can be verified whether the payload is no expected, corrupted or tampered by comparing the hash value previously issued with the hash calculated on the payload. It also plays an important role for digital forensics where the samples or proofs gathered from a system are free of tampering or manipulation.
Therefore, in the following sections, I’m going to explain briefly some of the well known hash functions and different ways to calculate them.
What is MD5 and how to get it?
The MD5 hash algorithm is a 128 bits length and it was developed on 1991. However, due to its age, it is not collision resistant and it cannot guarantee any uniqueness anymore. This means that it is not secure to use it for digital signatures or for passwords handling (even with Salt) or any other ambit where the uniqueness trait is important. The recommendation in these use cases is to use SHA256 or above, and stay away from MD5 unless that it is only used for checksums or error control.
In this section and the following ones, the hash operations will be done on the hello_world.txt
created for this activities:
$ echo -n "hello world" > hello_world.txt
$ cat hello_world.txt
hello world$
To calculate the MD5 in Linux it’s easy if you use the md5sum
command:
$ md5sum hello_world.txt
5eb63bbbe01eeed093cb22bb8f5acdc3 hello_world.txt
The openssl
can do the same MD5 operation:
$ openssl dgst -md5 hello_world.txt
MD5(hello_world.txt)= 5eb63bbbe01eeed093cb22bb8f5acdc3
You may also use python to do the same MD5 in a one line command:
$ python -c "import hashlib; import sys; print(hashlib.md5(open('hello_world.txt','r').read()).hexdigest())"
5eb63bbbe01eeed093cb22bb8f5acdc3
Or you can use perl to calculate the MD5 as well (Install Digest::MD5 perl module first):
$ perl -MDigest::MD5=md5_hex -le "print md5_hex <>" hello_world.txt
5eb63bbbe01eeed093cb22bb8f5acdc3
What is SHA1 and how to get it?
Similarly to MD5, the SHA1 it’s not a collision resistant algorithm anymore under the current computation power despite the output possibilities are higher than MD5 because it is a longer output (160 bits). Although the SHA-1 usage is not secure in some cryptographic use cases, it is still accepted hash function for checksum goals or HMAC (Hash-based Message Authentication Code).
SHA1 was published in 1995 but it was deprecated from 2011 and it’s usage not recommended for digital signatures forcing many certificates based in this algorithm to be renewed with a stronger one (SHA256).
In the following examples, it will show different ways to obtain the SHA1 from the file hello_world.txt
.
Get SHA1 with Linux command sha1sum
:
$ sha1sum hello_world.txt
2aae6c35c94fcfb415dbe95f408b9ce91ee846ed hello_world.txt
Obtaining SHA1 with openssl
:
$ openssl dgst -sha1 hello_world.txt
SHA1(hello_world.txt)= 2aae6c35c94fcfb415dbe95f408b9ce91ee846ed
Python script to obtain SHA1 from a file:
$ python -c "import hashlib; import sys; print(hashlib.sha1(open('hello_world.txt','r').read()).hexdigest())"
2aae6c35c94fcfb415dbe95f408b9ce91ee846ed
Perl script to compute SHA1 from a file (requires Digest::SHA perl module):
$ perl -MDigest::SHA=sha1_hex -l -e "print sha1_hex <>" hello_world.txt
2aae6c35c94fcfb415dbe95f408b9ce91ee846ed
What is SHA256 and how to get it?
The SHA256 is nowadays one of the most used and secure hash functions for uniqueness and integrity, because it is still collision resistant against the current computer power. So this hash algorithm and others with a higher bit length like SHA384 and SHA512 are recommended for digital signatures and for password handling (Ensure Salt in password).
The SHA256 belongs to the SHA-2 family hash functions alongside with SHA224, SHA384, SHA512, it was released in 2001 and will remain accepted as a secure standard the next decade probably. As its name states, it is 256 of hash value bit length which is higher than MD5 and SHA1.
See the next examples to know how to obtain the SHA256 message digest from the file hello_world.txt
:
Calculate SHA256 from Linux command sha256sum
:
$ sha256sum hello_world.txt
b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9 hello_world.txt
Get the SHA256 aided with openssl
tool:
$ openssl dgst -sha256 hello_world.txt
SHA256(hello_world.txt)= b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9
SHA256 with a python script:
$ python -c "import hashlib; import sys; print(hashlib.sha256(open('hello_world.txt','r').read()).hexdigest())"
b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9
SHA256 with perl (requires Digest::SHA perl module)::
$ perl -MDigest::SHA=sha256_hex -l -e "print sha256_hex <>" hello_world.txt
b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9
Calculating other hash functions
There are many other cryptographic hash functions beside the ones explained in this post, but it can be obtained with any of the above described methods.
SHA224, SHA384 and SHA512 with Linux command line:
$ sha224sum hello_world.txt
2f05477fc24bb4faefd86517156dafdecec45b8ad3cf2522a563582b hello_world.txt
$ sha384sum hello_world.txt
fdbd8e75a67f29f701a4e040385e2e23986303ea10239211af907fcbb83578b3e417cb71ce646efd0819dd8c088de1bd hello_world.txt
$ sha512sum hello_world.txt
309ecc489c12d6eb4cc40f50c902f2b4d0ed77ee511a7c7a9bcd3ca86d4cd86f989dd35bc5ff499670da34255b45b0cfd830e81f605dcf7dc5542e93ae9cd76f hello_world.txt
Whirlpool hash algorithm (512 bits length) with openssl:
$ openssl dgst -whirlpool hello_world.txt
whirlpool(hello_world.txt)= 8d8309ca6af848095bcabaf9a53b1b6ce7f594c1434fd6e5177e7e5c20e76cd30936d8606e7f36acbef8978fea008e6400a975d51abe6ba4923178c7cf90c802
MD2 and MD4 hash (both 128 bits length) with perl:
$ perl -MDigest::MD2=md2_hex -l -e "print md2_hex <>" hello_world.txt
d9cce882ee690a5c1ce70beff3a78c77
$ perl -MDigest::MD4=md4_hex -l -e "print md4_hex <>" hello_world.txt
aa010fbc1d14c795d86ef98c95479d17
Ripemd160 hash (160 bits length) with openssl:
$ openssl dgst -ripemd160 hello_world.txt
RIPEMD160(hello_world.txt)= 98c615784ccb5fe5936fbc0cbe9dfdb408d92f0f
How to use the hash values
Let’s say that you want to download the PHP version 8.1.7 package from a trusted source, in this case, the same PHP web site. Then you browse to the download page and reach to the download links and the hash checksums (SHA256 in this case):
An interesting usage of the hash value in terms of cybersecurity is to inspect the threat intelligence available on the internet associated to the file by searching the hash in Virustotal. To do so, first, go to the Virustotal web page, click on “Search” tab and then search the SHA256 value obtained previously from the PHP site:
After searching the hash value, Virustotal will display the information available about the file if it was scanned previously by other users:
As you may see on the picture, the file that you are about to download is considered benign currently (0/50 detections), which means that there are no malicious evidences on the file package detected yet. Hence, you are ensuring by this way the risk reduction of downloading any malicious software in your device.
Next, you proceed to download the package and, once the download it’s completed, you calculate the SHA256 of the file downloaded because it’s the checksum available in the PHP web page.
$ wget --quiet https://www.php.net/distributions/php-8.1.7.tar.gz
$ ls -l
total 19256
-rw-rw-r--. 1 fse fse 19714169 Jun 7 20:50 php-8.1.7.tar.gz
$ sha256sum php-8.1.7.tar.gz
5f0b422a117633c86d48d028934b8dc078309d4247e7565ea34b2686189abdd8 php-8.1.7.tar.gz
Thus, before starting to use the package, you compare the SHA256 value obtained offline with the one on the PHP page and, if it is equivalent then, you may conclude that the file you have downloaded is what the site states, ensuring by this way, that the file was not corrupted or tampered (Integrity).