Search code examples
pythonpandascsv

Pandas does not separate columns of imported csv file


I want to import a csv file as a dataframe using pandas. The file's structure looks as provided in the screenshot. (https://i.sstatic.net/N91d7.png) However, for some reason, using

df = pd.read_csv("Test.csv", delimiter = ',')

does not work. The resulting dataframe contains all content in one single column.

How can I separate the columns correctly? Thanks in advance.

I already played around with different options for the "read_csv" prompt, however, I did not yet find a solution.


Solution

  • I'm not sure how you got a csv file with ambiguous commas (the values are unquoted and contain commas, and the separator is also a comma), but the good news is ...

    Assuming the entire file follows the format in your question (specifically, if every second column is empty), this should do what you want:

    import pandas as pd
    text='Fact,Fact Note,' + ','.join(f'{x},Value Note for {x}' for x in ['California','Arkansas','Arizona','Alaska','Alabama','United States'])
    text += '''
    Population Estimates, July 1, 2022, (V2022),,39,029,342,,3,045,637,,7,359,197,,733,583,,5,074,296,,333,287,557,
    Population Estimates, July 1, 2021, (V2021),,39,142,991,,3,028,122,,7,264,877,,734,182,,5,049,846,,332,031,554,'''
    print(text,'\n')
    
    from io import StringIO
    i = 0
    rows = []
    
    with StringIO(text) as f:
        for line in f:
            if not i:
                columns = line[:-1].split(',') #-1 is to avoid newline
                print(columns)
            else:
                vals = line[:-1].split(',,')
                row = [vals[0],''] + [x for z in zip([int(val.replace(',', '')) for val in vals[1:]], ['']*(len(vals)-1)) for x in z]
                rows.append(row)
            i += 1
    print(rows)
    df = pd.DataFrame(rows, columns=columns)
    print('','',df,sep='\n')
    

    Sample input:

    Fact,Fact Note,California,Value Note for California,Arkansas,Value Note for Arkansas,Arizona,Value Note for Arizona,Alaska,Value Note for Alaska,Alabama,Value Note for Alabama,United States,Value Note for United States
    Population Estimates, July 1, 2022, (V2022),,39,029,342,,3,045,637,,7,359,197,,733,583,,5,074,296,,333,287,557,
    Population Estimates, July 1, 2021, (V2021),,39,142,991,,3,028,122,,7,264,877,,734,182,,5,049,846,,332,031,554
    

    Output:

                                              Fact Fact Note  California Value Note for California  ...  Alabama Value Note for Alabama  United States Value Note for United States
    0  Population Estimates, July 1, 2022, (V2022)              39029342                            ...  5074296                             333287557
    1  Population Estimates, July 1, 2021, (V2021)              39142991                            ...  5049846                             332031554
    
    [2 rows x 14 columns]
    

    Transposed output (easier to read):

                                                                            0                                            1
    Fact                          Population Estimates, July 1, 2022, (V2022)  Population Estimates, July 1, 2021, (V2021)
    Fact Note
    California                                                       39029342                                     39142991
    Value Note for California
    Arkansas                                                          3045637                                      3028122
    Value Note for Arkansas
    Arizona                                                           7359197                                      7264877
    Value Note for Arizona
    Alaska                                                             733583                                       734182
    Value Note for Alaska
    Alabama                                                           5074296                                      5049846
    Value Note for Alabama
    United States                                                   333287557                                    332031554
    Value Note for United States
    

    Note that I have used a string and StringIO class instead of a text file, for ease of creating the example.