Search code examples
pythonpandaspython-polars

How to replace a specific field inside a JSON string in each row of a csv file in Python with a random value?


I have a csv file named input.csv with the following columns:

row_num start_date_time id json_message
120 2024-02-02 00:01:00.001+00 1020240202450 {'amount': 10000, 'currency': 'NZD','seqnbr': 161 }
121 2024-02-02 00:02:00.001+00 1020240202451 {'amount': 20000, 'currency': 'AUD','seqnbr': 162 }
122 2024-02-02 00:03:00.001+00 1020240202452 {'amount': 30000, 'currency': 'USD','seqnbr': None }
123 2024-02-02 00:04:00.001+00 1020240202455 {'amount': 40000, 'currency': 'INR','seqnbr': 163 }

I'm using python3 to read this csv file and i need to replace seqnbr field under json_message column with a random integer digit for each row. If the seqnbr contains None then that row should not be replaced with random seqnbr rather it should be retained as is. Delimiter for my csv file is pipe (|) symbol. I'm using the below python code to replace the file with the random generated integer value but still it doesn't overwrite it and here is my code:

def update_seqnbr(cls):
    filename = 'input.csv'
    seqnbr_pattern = "'seqnbr': ([\s\d]+)"
    with open(filename, 'r') as csvfile:
        datareader = (csv.reader(csvfile, delimiter="|"))
        next(datareader, None)  # skip the headers
        for row in datareader:
            json_message = row[3]
            match = re.findall(seqnbr_pattern, json_message)
            if len(match) != 0:
                replaced_json_message = json_message.replace(match[0], str(random.randint(500, 999)))
                row[3] = replaced_json_message
                x = open(filename, "a")
                x.writelines(row)
                x.close()

Below is how my file should look like:

row_num start_date_time id json_message
120 2024-02-02 00:01:00.001+00 1020240202450 {'amount': 10000, 'currency': 'NZD','seqnbr': 555 }
121 2024-02-02 00:02:00.001+00 1020240202451 {'amount': 20000, 'currency': 'AUD','seqnbr': 897 }
122 2024-02-02 00:03:00.001+00 1020240202452 {'amount': 30000, 'currency': 'USD','seqnbr': None }
123 2024-02-02 00:04:00.001+00 1020240202455 {'amount': 40000, 'currency': 'INR','seqnbr': 768 }

Can someone please help me on this?


Solution

  • I made a few changes to your code, using csv to write the new csv too, and most importantly unidenting the write part, that was ignoring the None lines:

    def update_seqnbr(cls):
        filename = 'input.csv'
        seqnbr_pattern = "'seqnbr': ([\s\d]+)"
        with open(FILENAME, 'a', newline='') as targetfile:
            targetcsv = csv.writer(targetfile, delimiter='|')
            with open(filename, 'r') as csvfile:
                datareader = (csv.reader(csvfile, delimiter="|"))
                targetcsv.writerow(next(datareader, None))  # skip the headers
                for row in datareader:
                    json_message = row[3]
                    match = re.findall(seqnbr_pattern, json_message)
                    if len(match) != 0:
                        replaced_json_message = json_message.replace(match[0], str(random.randint(500, 999)))
                        row[3] = replaced_json_message
                    targetcsv.writerow(row)
    

    Here's the output:

    row_num|start_date_time|id|json_message
    120|2024-02-02 00:01:00.001+00|1020240202450|{'amount': 10000, 'currency': 'NZD','seqnbr': 961}
    121|2024-02-02 00:02:00.001+00|1020240202450|{'amount': 20000, 'currency': 'NZD','seqnbr': 863}
    122|2024-02-02 00:03:00.001+00|1020240202450|{'amount': 30000, 'currency': 'NZD','seqnbr': None }
    123|2024-02-02 00:04:00.001+00|1020240202450|{'amount': 40000, 'currency': 'NZD','seqnbr': 666}