Search code examples
pythoncsvdata-manipulationdata-munging

"Doing work" on csv DictReader fails


I am writing a script where I need to read a CSV into a DictReader, do some work on the fields (data munging), then output the DictReader to a csv via DictWriter.

If I read the CSV then write the Dict, the process works.

#Create the sample file
headers = ['Symbol', 'Price', 'Date', 'Time', 'Change', 'Volume']
rows = [{'Symbol':'AA', 'Price':39.48, 'Date':'6/11/2007',
          'Time':'9:36am', 'Change':-0.18, 'Volume':181800},
        {'Symbol':'AIG', 'Price': 71.38, 'Date':'6/11/2007',
          'Time':'9:36am', 'Change':-0.15, 'Volume': 195500},
        {'Symbol':'AXP', 'Price': 62.58, 'Date':'6/11/2007',
          'Time':'9:36am', 'Change':-0.46, 'Volume': 935000},
        ]

#Open sample file
with open('stocks.csv','w') as f:
    f_csv = csv.DictWriter(f, headers)
    f_csv.writeheader()
    f_csv.writerows(rows)

#Output the dict    
with open('stocks.csv', 'r') as file:
    csvread = csv.DictReader(file, delimiter=',')

    with open('out.csv', 'w') as out:
        headertowrite = ['Time', 'Symbol', 'NewColumn']
        writer = csv.DictWriter(out, headertowrite, extrasaction='ignore')
        writer.writeheader()
        writer.writerows(csvread)
#Works!

However - if I add a new column, it appears that I lose all the data in the DictReader:

headers = ['Symbol', 'Price', 'Date', 'Time', 'Change', 'Volume']
rows = [{'Symbol':'AA', 'Price':39.48, 'Date':'6/11/2007',
          'Time':'9:36am', 'Change':-0.18, 'Volume':181800},
        {'Symbol':'AIG', 'Price': 71.38, 'Date':'6/11/2007',
          'Time':'9:36am', 'Change':-0.15, 'Volume': 195500},
        {'Symbol':'AXP', 'Price': 62.58, 'Date':'6/11/2007',
          'Time':'9:36am', 'Change':-0.46, 'Volume': 935000},
        ]

with open('stocks.csv','w') as f:
    f_csv = csv.DictWriter(f, headers)
    f_csv.writeheader()
    f_csv.writerows(rows)

with open('stocks.csv', 'r') as file:
    csvread = csv.DictReader(file, delimiter=',')


    for row in csvread:
        row['NewColumn'] = '1'

    with open('out.csv', 'w') as out:
        headertowrite = ['Time', 'Symbol', 'NewColumn']
        writer = csv.DictWriter(out, headertowrite, extrasaction='ignore')
        writer.writeheader()
        writer.writerows(csvread)
#Out.csv is blank!

Is there a way I can perform work on the DictReader before I write it?


Solution

  • As csvread has been entirely consumed by your first for loop, our with statement will output a blank file, as you may notice.

    You should either:

    --> save rows into a variable

    import csv
    headers = ['Symbol', 'Price', 'Date', 'Time', 'Change', 'Volume']
    rows = [{'Symbol':'AA', 'Price':39.48, 'Date':'6/11/2007',
             'Time':'9:36am', 'Change':-0.18, 'Volume':181800},
            {'Symbol':'AIG', 'Price': 71.38, 'Date':'6/11/2007',
             'Time':'9:36am', 'Change':-0.15, 'Volume': 195500},
            {'Symbol':'AXP', 'Price': 62.58, 'Date':'6/11/2007',
             'Time':'9:36am', 'Change':-0.46, 'Volume': 935000}]
    
    with open('stocks.csv','w') as f:
        f_csv = csv.DictWriter(f, headers)
        f_csv.writeheader()
        f_csv.writerows(rows)
    
    with open('stocks.csv', 'r') as file:
        csvread = csv.DictReader(file, delimiter=',')
    
        rows = []
        for row in csvread:
            row['NewColumn'] = '1'
            rows.append(row)
    
        with open('out.csv', 'w') as out:
            headertowrite = ['Time', 'Symbol', 'NewColumn']
            writer = csv.DictWriter(out, headertowrite, extrasaction='ignore')
            writer.writeheader()
            writer.writerows(rows)
    

    or

    --> make your modification within the with statement that will output the file (you read one line, modify it and write it.

    import csv
    headers = ['Symbol', 'Price', 'Date', 'Time', 'Change', 'Volume']
    rows = [{'Symbol':'AA', 'Price':39.48, 'Date':'6/11/2007',
             'Time':'9:36am', 'Change':-0.18, 'Volume':181800},
            {'Symbol':'AIG', 'Price': 71.38, 'Date':'6/11/2007',
             'Time':'9:36am', 'Change':-0.15, 'Volume': 195500},
            {'Symbol':'AXP', 'Price': 62.58, 'Date':'6/11/2007',
             'Time':'9:36am', 'Change':-0.46, 'Volume': 935000}]
    
    with open('stocks.csv','w') as f:
        f_csv = csv.DictWriter(f, headers)
        f_csv.writeheader()
        f_csv.writerows(rows)
    
    with open('stocks.csv', 'r') as file:
        csvread = csv.DictReader(file, delimiter=',')
    
        with open('out.csv', 'w') as out:
            headertowrite = ['Time', 'Symbol', 'NewColumn']
            writer = csv.DictWriter(out, headertowrite, extrasaction='ignore')
            writer.writeheader()
            for row in csvread:
                row['NewColumn'] = '1'
                writer.writerow(row)
    

    Take care! It is writerow not writerows in the second solution!

    Note that I also highly recommend the second solution which is more scalable, in terms of memory consumption.