Search code examples
pythonpandassplitconcatenation

Flattening column data with split then merging df with Pandas


Using names = df['Name and Location'].str.split(',', expand=True) I am able to split this dense data at delimiters like colons.

I'm stuck on how to recombine the data into a flatter record. I've tried:

pd.concat([df, names])

Records end at "complaint #", and begin at date: which is in another column.

**Last_Name , First_Name**
City: City_Name
County: OUT_OF_STATE
Zip Code: 00000
License #: AA0000000
Complaint # AA00000000000

**Company:** Company_Name,_INC
City: City_Name
County: County_Name
Zip Code: 00000
Company: Company_Name LIC AA0000
City: City_Name
County: County_Name
Zip Code: 00000
License: string_or_int
Complaint # AA00000000000

**Last_Name**, First_Name
Company: Company_Name
City: City_Name
County: County_Name
Zip Code: 00000
License #: AA00000000000
Complaint # AA00000000000

Ideally, each "record" would be flat, like:

First Name Last Name Company City County Zip Code License Complaint Date The String Why the String 

Last_name_1 First_name_1 Company_Name_1 City_1 County_1 00001 AA000000 string_1 why_string_1

Solution

  • To split at a delimiter, and create and combine a new column with the existing df, use:

    df = pd.concat((df, df['Column_to_Split'].str.split('String_to_Go:', expand=True)), axis=1, ignore_index=True)
    

    Any delimiter can be used, including an empty string. The key here is expand = True as it creates a new column, which was the goal.