Search code examples
pythonjsonmd5

How to run transformation on Json data format using python


I am familiar with running data transformation using python on csv file format but new to running data transformation on Json format.

I have a Json streaming data and I want to apply md5 algorithm to generate hash. I have created the function responsible for generating md5 hash, but I do not know how to apply this function to the json data

Here is my md5 hash generation script

import hashlib

# initializing string
Password= "kureen2022!"
AccountName = "[email protected]"


# Convert password to bytes
pw_byte = Password.encode()

# Generate Salt
salt = AccountName.lower().encode()  

# Concatenate salt with password, apply md5
result = hashlib.md5(salt+pw_byte)
pw_hash = result.hexdigest()

# printing Hash value.
print(pw_hash)

here is my Json script

import json

f = open('my_json.Json')

data = json.load(f)

for i in data['details']:
     print(i)

here is a sample json data

{
  "details" : [
   {
    "AccountName": "dojoujre",
     "Password": "password123"
    },
    {
    "AccountName": "dojoujre",
     "Password": "password007"
     }
  ]
}

The objective is to apply the md5 script to the json file generate a hash

{
  "details" : [
   {
    "AccountName": "dojoujre",
     "Password": "password123",
     "Hash": "93837373930"
    },
    {
    "AccountName": "dojoujre",
     "Password": "password007",
     "Hash": "eer3er5t6t6y"
     }
  ]
}

would be glad for some direction. Also if anyone can direct me to the right materials where I can learn how to perform data transformation with Json


Solution

  • In Python, JSON objects are handled just like dictionaries. Hence, your code shall be something similar to:

    for x in data['details']:
      x['Hash'] = compute_md5(x['Password'])
    

    Here is the official doc about dictionaries. Check out also this doc which has some useful functionalities to serialize/deserialize jsons.

    A short notice about md5 hashing: I'd discourage you to use md5 to compute "security" hashes in a production and/or critical environment. Md5 algorithm is basically broken and if I recall correctly, a collision can be found with only 2^11 operations, which is very easy with modern computers. You should rely on much more secure algorithms such as SHA-3.