Search code examples
pythonpandasdataframemergeconcatenation

Python/Pandas DataFrame with leapfrog assigned columns


I would like to kindly approach you with request for help and support with mine conundrum. I am working at moment of refresher of old issue and it occurred me to work on improvements :))).

I am creating DataFrame for future analysis from multiple Excel Files. When file contains always multiple columns which are transposed to rows and than connected to DF. So fa so good.

However once I start generating additional columns by generating columns names based on readied entry from Excel. I am having a issue find some elegant solution for this for sure trivial problem.

Example:

dat1 = {'A':[3, 1], 'B':[4, 1]}
df_1 = pd.DataFrame(data = dat1)


this is DataFrame df_1:
   A  B
0  a3  b4
1  a1  b1


dat2 = {'U':[9,9], 'Y':[2,2]}   
df_2 = pd.DataFrame(data = dat2)


this is DataFrame df_2:
   U  Y
0  u9  y2
1  u9  y2

Wished output is to assigne value to DF by columns name for multiple entries (assign complete DF to another one):

dat3 = {'A':[], 'U':[], 'B':[], 'Y':[]}
df_3 = pd.DataFrame(data = dat3)


this is DataFrame df_3:
    A   U   B  Y
0  a3  u9  b4  y2
1  a1  u9  b1  y2
 

At moment I am elaborating with all Join/Merge/Concat function but non of them is able to do do it by itself. I can imagine to try to create new DF or assign according some index however this seems as overshoot for this. Main column name list is made separately in separate function.

Please is there any simple way which I am missing?

Many thanks for your time, consideration and potential help in advance.

Best Regards Jan


Solution

  • You should use concat method to concatenate the data frames as the following:
    First, creating the data frames:

    import pandas as pd
    dat1 = {'A':[3, 1], 'B':[4, 1]}
    df_1 = pd.DataFrame(data = dat1) 
    df_1
    

    output:

        A   B
    0   3   4
    1   1   1
    
    
    dat2 = {'U':[9,9], 'Y':[2,2]}   
    df_2 = pd.DataFrame(data = dat2)
    df_2     
    
        U   Y
    0   9   2
    1   9   2
    

    Then use concat method:

    df_3 = pd.concat([df_1, df_2], axis=1)
    df_3
    

    output:

        A   B   U   Y
    0   3   4   9   2
    1   1   1   9   2
    

    The last step is to rearrange df_3 columns to get an output similar to the one you have shown in your question, you should use:

    df_3 = df_3[['A', 'U', 'B', 'Y']]
    df_3
    

    output:

        A   U   B   Y
    0   3   9   4   2
    1   1   9   1   2