Search code examples
pythonpython-3.xdigital-signatureencodehmac

Should I use Base64 of HMAC digest or just HMAC hex digest?


Legend

I expose an API which requires client to sign requests by sending two headers:

Authorization: MyCompany access_key:<signature>
Unix-TimeStamp: <unix utc timestamp in seconds>

To create a signature part, the client should use a secret key issued by my API service.

In Python (Py3k) it could look like:

import base64
import hmac
from hashlib import sha256
from datetime import datetime

UTF8 = 'utf-8'
AUTH_HEADER_PREFIX = 'MyCompany'

def create_signature(access_key, secret_key, message):
    new_hmac = hmac.new(bytes(secret_key, UTF8), digestmod=sha256)
    new_hmac.update(bytes(message, UTF8))
    signature_base64 = base64.b64encode(new_hmac.digest())
    return '{prefix} {access_key}:{signature}'.format(
        prefix=AUTH_HEADER_PREFIX,
        access_key=access_key,
        signature=str(signature_base64, UTF8).strip()
    )


if __name__ == '__main__':
    message = str(datetime.utcnow().timestamp())
    signature = create_signature('my access key', 'my secret key',  message)
    print(
        'Request headers are',
        'Authorization: {}'.format(signature),
        'Unix-Timestamp: {}'.format(message),
        sep='\n'
    )
    # For message='1457369891.672671', 
    # access_key='my access key' 
    # and secret_key='my secret key' will ouput:
    #
    # Request headers are
    # Authorization: MyCompany my access key:CUfIjOFtB43eSire0f5GJ2Q6N4dX3Mw0KMGVaf6plUI=
    # Unix-Timestamp: 1457369891.672671

I wondered if I could avoid dealing with encoding digest of bytes to Base64 and just use HMAC.hexdigest() to retrieve a string. So that my function will change to:

def create_signature(access_key, secret_key, message):
    new_hmac = hmac.new(bytes(secret_key, UTF8), digestmod=sha256)
    new_hmac.update(bytes(message, UTF8))
    signature = new_hmac.hexdigest()
    return '{prefix} {access_key}:{signature}'.format(
        prefix=AUTH_HEADER_PREFIX,
        access_key=access_key,
        signature=signature
    )

But then I found that Amazon uses similar approach as in my first code snippet:

Authorization = "AWS" + " " + AWSAccessKeyId + ":" + Signature;

Signature = Base64( HMAC-SHA1( YourSecretAccessKeyID, UTF-8-Encoding-Of( StringToSign ) ) );

Seeing that Amazon doesn't use hex digest I stopped myself to move forward with it because maybe they know something I don't.


Update

I've measured a performance and found hex digest to be faster:

import base64
import hmac
import string
from hashlib import sha256


UTF8 = 'utf-8'
MESSAGE = '1457369891.672671'
SECRET_KEY = 'my secret key'
NEW_HMAC = create_hmac()


def create_hmac():
    new_hmac = hmac.new(bytes(SECRET_KEY, UTF8), digestmod=sha256)
    new_hmac.update(bytes(MESSAGE, UTF8))
    return new_hmac


def base64_digest():
    return base64.b64encode(NEW_HMAC.digest())


def hex_digest():
    return NEW_HMAC.hexdigest()



if __name__ == '__main__':
    from timeit import timeit
    
    print(timeit('base64_digest()', number=1000000,
                  setup='from __main__ import base64_digest'))
    print(timeit('hex_digest()', number=1000000,
                 setup='from __main__ import hex_digest'))

Results with:

3.136568891000934
2.3460130329913227

Question #1

Does someone know why do they stick to Base64 of bytes digest and don't use just hex digest? Is there some solid reason to keep using this approach over hex digest?

Question #2

According to RFC2716 the format of Authorization header value when using Basic Authentication is:

Authorization: Base64(username:password)

So basically you wrap with Base64 two values (user's id and password) seprated by colon.

As you can see in my code snippet and in Amazon's documentation nor me, nor Amazon do that for own custom value of the Authorization header. Would it be a better style to wrap the whole pair as Base64(access_key:signature) to stick closer to this RFC or it doesn't matter at all?


Solution

  • Amazon does use the hex digest in Signature Version 4.

    Authorization: AWS4-HMAC-SHA256 Credential=AKIDEXAMPLE/20150830/us-east-1/iam/aws4_request, SignedHeaders=content-type;host;x-amz-date, Signature=5d672d79c15b13162d9279b0855cfba6789a8edb4c82c400e06b5924a6f2b5d7

    http://docs.aws.amazon.com/general/latest/gr/sigv4-add-signature-to-request.html

    Your example is from Signature Version 2, the older algorithm, which does use Base-64 encoding for the signature (and which also is not supported in the newest AWS regions).

    So, your concern that AWS knows something you don't is misplaced, since their newer algorithm uses it.

    In the Authorization: header, it really doesn't make a difference other than a few extra octets.

    Where Base-64 gets messy is when the signature is passed in the query string, because + and (depending on who you ask) / and = require special handling -- they need to be url-escaped ("percent-encoded") as %2B, %2F, and %3D respectively... or you have to make accommodations for the possible variations on the server... or you have to require the use of a non-standard Base-64 alphabet, where + / = becomes - ~ _ the way CloudFront does it. (This particular non-standard alphabet is only one of multiple non-standard options, all "solving" the same problem of magic characters in URLs with Base-64).

    Go with hex-encoding.

    You will almost inevitably find would-be consumers of your API that find Base-64 to be "difficult."