Search code examples
pythoncsvdata-cleaning

Apply a function to each line of a CSV file in python


I have a regular expression that I want to apply to each line of a CSV file.

Here's the function which basically removes all comma's encountered before a single digit number. The function is working perfectly fine for the string.

Input : text = "52A, XYZ Street, ABC District, 2, M, Brown
Function : re.sub('(?<!\s[\dA-Z]),(?!\s+\d,?)', '', text)
Output : '52A XYZ Street ABC District, 2, M, Brown'

However, I have a CSV file containing hundreds of such lines. For instance

1, 5273249, 1061/72, 150-CF, S/O:XVZ, 1, ABX, 45, 0, Husband, 9213
1, 5272849, 1063/36, 150-AS, S/O:XVZ, 1, ABX, 45, 0, Wife, 9253
1, 5274549, 10626/12, 150-RT, S/O:XVZ, 1, ABX, 45, 0, Son, 9214

I tried to read it using CSV reader and apply the function but unfortunately, it's not producing any output. What did I do wrong here:

def myFunction(text):
    return re.sub('(?<!\s[\dA-Z]),(?!\s+\d,?)', '', text)

import csv
with open('temp1.csv', 'r') as csvfile:
    spamreader = csv.reader(csvfile, delimiter=',')
    for row in spamreader:
        l = ','.join(row)    
        myFunction(l)

Solution

  • There's a handy module called fileinput that allows you to easily edit files line by line.

    Take a look at:

    import re
    import fileinput
    
    # For each line
    for line in fileinput.input(files='example.csv', inplace=True, backup='.bak'):
        # Replace it with the regex you provided
        print(re.sub('(?<!\s[\dA-Z]),(?!\s+\d,?)', '', line), end='')
    

    Output:

    Before:
    1,  5273249, 1061/72, 150-CF, S/O:XVZ, 1, ABX, 45, 0, Husband, 9213
    1,  5272849, 1063/36, 150-AS, S/O:XVZ, 1, ABX, 45, 0, Wife, 9253
    1,  5274549, 10626/12, 150-RT, S/O:XVZ, 1, ABX, 45, 0, Son, 9214
    
    After:
    1,  5273249, 1061/72, 150-CF S/O:XVZ, 1, ABX, 45, 0, Husband, 9213
    1,  5272849, 1063/36, 150-AS S/O:XVZ, 1, ABX, 45, 0, Wife, 9253
    1,  5274549, 10626/12, 150-RT S/O:XVZ, 1, ABX, 45, 0, Son, 9214
    

    This also creates a handy example.csv.bak file for you in case you need it!