Search code examples
pythondataframeindexingrefer

Two dataframes connected by index


I have two dataframes.

[2500 rows x 4 columns]
                   Kundenname              Adresse                Ort      PLZ
0          Amt Nortorfer Land        Niedernstraße 6            Nortorf  24539.0
1                Nord    GmbH         Heilbadstr. 85            Münster  24529.0
2               Vertrieb GmbH              Straße  4             Kassel  31117.0
.......


[1900 rows x 38 columns]
   0     1      2       3     4     5   ...    32    33    34    35    36    37
0  (0   118   1999   2117)  None  None  ...  None  None  None  None  None  None
1  (1   2000) None   None   None ....
....

The result should be like this:

              Kundenname          Adresse      Ort      PLZ
0     Amt Nortorfer Land  Niedernstraße 6  Nortorf  24589.0
118   Amt Nortorfer Land  Niedernstraße 6  Nortorf  24589.0
1999  Amt Nortorfer Land  Niedernstraße 6  Nortorf  24589.0
2117  Amt Nortorfer Land  Niedernstraße 6  Nortorf  24589.0

1       ......
2000    ......

etc.

I just did it with df.loc[[9,118,1999,2117]] but I need a loop or something that I don't have to type in manually.


Solution

  • When df1 is your dataframe with your address data, and df2 is your index dataframe as such:

    df2 = pd.DataFrame({0:[0,1], 1:[118, 2000], 2:[1999, None], 3:[2117, None], 4:[None, None]})
    

    You can rewrite your index_dataframe (df2) using melt:

    index_values = pd.melt(df2.reset_index(), id_vars='index').set_index('index')[['value']].dropna()
    

    This will give you the following result:

            value
    index   
    0       0
    1       1
    0       118
    1       2000
    0       1999
    0       2117
    

    You can use this to merge with your df1:

    index_values.merge(df1, left_index=True, right_index=True).set_index('value')
    

    Result:

            Kundenname          Adresse         Ort     PLZ
    value               
    0.0     Amt Nortorfer Land  Niedernstraße 6 Nortorf 24539.0
    118.0   Amt Nortorfer Land  Niedernstraße 6 Nortorf 24539.0
    1999.0  Amt Nortorfer Land  Niedernstraße 6 Nortorf 24539.0
    2117.0  Amt Nortorfer Land  Niedernstraße 6 Nortorf 24539.0
    1.0     Nord GmbH           Heilbadstr. 85  Münster 24529.0
    2000.0  Nord GmbH           Heilbadstr. 85  Münster 24529.0
    

    If df2 really contains parentheses, as Mr. T asked, you should remove these first of course. Assuming all your df2-values are string, this would mean doing something like:

    index_values.value = index_values.value.str.replace('(', '').str.replace(')', '').astype(float)