Search code examples
pythonarcgis

How to convert a csv-file to a dictionnary of lists with python?


I'm trying to have this kind of result :

Output_Screenshot

Here is the csv-file :

OsmID,NewName,IdLocal

1020287758,NN1,Id0001

1021229973,NN2,Id0002

1025409497,NN3,Id0003

I'm using the code below:

import csv

input = r'C:\Users\_M92\csvFiles\csv0001.csv'

fileRead = open(input, 'r')

with open(input, 'r') as csv:
    headerLine = fileRead.readline()
    header = headerLine.split(",")  
    #print(header)
    nameIndex = header.index("OsmID")    
    output = {}
    for line in fileRead.readlines():
        values = line.split(",")
        output[values[nameIndex]] = values

print(output)

And it results in the following error:

File "c:\Users\_M92\Scripts\CsvToDict.py", 
    
    line 19, in <module>
        nameIndex = header.index("OsmID")

  ValueError: 'OsmID' is not in list

Solution

  • Instead of manually splitting each line by commas, use the CSV module that you've imported. This module contains a DictReader class that will yield dictionaries for each row. Then, you just need to add this to your output dictionary.

    # Create an empty dictionary
    # We will add keys to this as needed
    output = {}
    # Keep track of number of rows, so we can add an empty column if needed
    row_count = 0
    
    # This function adds a row to the output dictionary
    def add_row(row_dict):
        global row_count # Need to declare this as global because we're assigning to the variable in this function
        if not row_dict: return # If row is empty, do nothing
        for k, v in row_dict.items():
            # Loop over all key-value pairs in the row to add
            if k not in output: # If the output doesn't contain this column, create a blank column
                output[k] = [None] * row_count 
       
            output[k].append(v) # Append the value to the correct column in output
    
        row_count += 1 
    
    input_file = r'C:\Users\_M92\csvFiles\csv0001.csv'
    with open(input_file, 'r') as fh:
        reader = csv.DictReader(fh) # Create a DictReader
        for row in reader:
            add_row(row) # Add every row to the output
    

    This gives the following output:

    {'OsmID': ['1020287758', '1021229973', '1025409497'], 
     'NewName': ['NN1', 'NN2', 'NN3'], 
     'IdLocal': ['Id0001', 'Id0002', 'Id0003']}
    

    Note: I removed the blank lines in the input csv you provided, but it doesn't make a difference to the program, since a blank line will yield an empty dictionary from DictReader, and add_row doesn't do anything with empty dicts

    Note 2: You could discard the row_count variable if you dynamically count the number of rows like so:

    def add_row(row_dict):
        row_count = 0
        for first_key, first_val in output.items():
            row_count = len(first_val)
            break # We can just break out here because all keys should have the same number of values
        
        # Create keys that do not yet exist in output but do exist in the new row
        existing_keys = set(output.keys())
        new_row_keys = set(row_dict.keys())
        keys_to_create = new_row_keys - existing_keys 
        for key in keys_to_create:
            output[key] = [None] * row_count
    
        # Append to each column in output
        for key in output:
            output[key].append(row_dict.get(key, None)) # If the key doesn't exist in the current row, append None