Search code examples
pythonpandasdataframenumpysas

Pandas - Insert a column at a certain character index


I have a df like the one below:

Account_num First_Name  Last_Name   Zipcode  Amount
AAA111      AAA         BBB         12345    784.23
AAA112      AAB         BBA         44546    2145.32
AAA113      AAC         BBC         75452    6563.24
AAA114      AAD         BBD         45484    9532.21

I need to format this dataframe so that the columns are in a certain position from the beginning of the line. For example:

I need the account_num to start at char 5 instead of being at char 1 of the line. Since the account_nums are always 6 characters long, I need the first_name to start at 13th character of the line.(including a space) and so on.

I have the example in sas and want to rewrite it in Panda. How can this be done?

put @5(acct_num) @13(first_name)($char32. -l)@20(last_Name) ($char32. -l)

enter image description here

enter image description here

1234567891112131415161718192021222324252627282930 
    AAA111   AAA           BBB   12345   784.23 
    AAA112   AAB           BBA   44546   2145.32 
    AAA113   AAC           BBC   75452   6563.24 
    AAA114   AAD           BBD   45484   9532.21

In the above, the first line denotes the character count and my output needs to be that the account_num column starts at char 5 and First_Name starts at 13, last_Name at 20 and so on.

enter image description here

I tried this but is not working as desired:

def format_columns(df): # Define column widths and positions
widths = [6, 10, 20, 8, 10]
positions = [5, 13, 33, 53, 63]
for col, width, pos in zip(df.columns, widths, positions): df[col] = df[col].astype(str).apply(lambda x: x.ljust(width))
df[col] = df[col].apply(lambda x: ’ ’ * (pos - len(x)) + x if len(x) < pos else x)
return df # Format DataFrame columns
formatted_df = format_columns(df)
print(formatted_df)

Can someone help understand what the issue is with my code?


Solution

  • This is what finally worked for me.

    import pandas as pd # Sample DataFrame 
    data = { 
        'Account_num': ['AAA111', 'AAA112', 'AAA113', 'AAA114'], 
        'First_Name': ['AAA', 'AAB', 'AAC', 'AAD'], 
        'Last_Name': ['BBB', 'BBA', 'BBC', 'BBD'], 
        'Zipcode': [12345, 44546, 75452, 45484], 
        'Amount': [784.23, 2145.32, 6563.24, 9532.21]
         }
         
    df = pd.DataFrame(data)
    if df.index.name is None:
        set
        account_start = 5
    else: 
        account_start = 1 
    
    
    df['Account_num'] = df['Account_num'].apply(lambda x: x.rjust(account_start + 5))
    df['First_Name'] = df['First_Name'].apply(lambda x:x.rjust(1))
    df['Last_Name'] = df['Last_Name'].apply(lambda x:x.rjust(6))
    
    df.to_csv('output.txt', sep='\t', index=False,header= False)