Search code examples
pandasdataframecsvtext

pandas read textfile as dataframe


I have a textfile with hundreds of rows where each row looks like the following:

"LastName, FirstName MiddleName", 222555,X-150,2023,0.15,0.20,0.5,"1, 2, 10",--,1.5,5.10,report

The separators are normally commas except when within quotes. The above row each needs to be separated into the following columns:

LastName, FirstName MiddleName
222555
X-150
2023
0.15
0.20
0.5
1,2,10
--
1.5
5.10
report

Solution

  • Try something like this if you want to use Pandas. Just using read_csv should still work fine on your data, then you can split the columns that are in quotes into their own columns. Last, remove the columns with the quotes:

    #### Making mock csv/text file
    from io import StringIO
    csv_file = StringIO('\n'.join(['"LastName, FirstName MiddleName", 222555,X-150,2023,0.15,0.20,0.5,"1, 2, 10",--,1.5,5.10,report']*3))
    #########################
    
    # Load in the file as a CSV
    df = pd.read_csv(csv_file, header = None)
    
    # Split the names into separate name columns
    df[['Last', 'First', 'Middle']] = [x for x in df[0].str.replace(',', '').str.split(' ')]
    
    # Split the numbers in quotes into 3 separate columns
    df[['num1', 'num2', 'num3']] = [x for x in df[7].str.split(', ')]
    
    # Remove the columns that you split
    df = df.drop(df.columns[[0, 7]], axis = 1)
    

    Output:

             1     2       3       4      5   6   8   9  10     11  Last            First       Middle  num1 num2 num3
    0   222555  X-150   2023    0.15    0.2 0.5 --  1.5 5.1 report  LastName    FirstName   MiddleName  1   2   10
    1   222555  X-150   2023    0.15    0.2 0.5 --  1.5 5.1 report  LastName    FirstName   MiddleName  1   2   10
    2   222555  X-150   2023    0.15    0.2 0.5 --  1.5 5.1 report  LastName    FirstName   MiddleName  1   2   10