Rename duplicates accordingly in two Pandas DataFrames

I have two Pandas DataFrames: df_header and df_species.

The first contain for each row the information about a vegetation plot (longitude, latitude, country, ...) and the second one contain for each row the information about a species (name, family, genus, ...). They both have a column named RELEVE_NR, which is the ID of the vegetation plot and indicates which species was found in which vegetation plot. Here is a small example of the two DataFrames to see what they look like, with a reduced number of column for simplification:

df_header:

RELEVE_NR	Longitude	Latitude
12345	42.23	23.43
12346	36.85	39.25
...	...	...

df_species:

RELEVE_NR	Species	Family
12345	Agrostis stolonifera	Poaceae
12345	Paspalum vaginatum	Poaceae
12345	Scirpus lacustris	Cyperaceae
12346	Phragmites australis	Poaceae
...	...	...

As you can see, in a perfect world, I should have one different value in the column RELEVE_NR per row of df_header, but several time the same value in the column RELEVE_NR per row of df_species (we can imagine that each vegetation plot has a unique ID, but that each vegetation plot contains several observations of species).

However, it happens that we have several time the same value in the column df_header for several rows (I won't go into the details of why, but it can and it does happen). For these cases, here is what I have to do but can't find how:

except for the first time it appears, rename the duplicated values of RELEVE_NR in df_header (with values that don't exist in the dataset obviously). For example, if we have three rows that have the value 12345 in the column RELEVE_NR, we could rename the second one 5001 and the second 5002 (if we are sure that no other vegetation plots have the IDs 5001 nor 5002).
duplicate each row that has this duplicated value in df_species the same number of time the value appeared. For example, if the vegetation plot 12345 had 10 values associated in the df_species DataFrame, we should duplicate each of them twice in order to have 30 values. We should let the first 10 with the ID 12345, then put the number 5001 in the column RELEVE_NR for the following ten and finally the number 5002 for the last ten.

Quick example: This df_header:

RELEVE_NR	Longitude	Latitude
12345	42.23	23.43
12346	36.85	39.25
12346	36.85	39.25
12347	29.34	13.45
...	...	...

Should become this df_header (we changed the value of the column RELEVE_NR of the second row having the value 12346 by a new number that isn't attributed to another row):

RELEVE_NR	Longitude	Latitude
12345	42.23	23.43
12346	36.85	39.25
12348	36.85	39.25
12347	29.34	13.45
...	...	...

And this associated df_species:

RELEVE_NR	Species	Family
12345	Sorbus aucuparia	Rosaceae
12346	Agrostis stolonifera	Poaceae
12346	Paspalum vaginatum	Poaceae
12346	Scirpus lacustris	Cyperaceae
12347	Phragmites australis	Poaceae
12347	Paspalum vaginatum	Adoxaceae
...	...	...

Should become this df_species (we duplicated the 3 rows having the value 12346 in the column RELEVE_NR and we changed the values of the column RELEVE_NR for the duplicated rows to 12348):

RELEVE_NR	Species	Family
12345	Sorbus aucuparia	Rosaceae
12346	Agrostis stolonifera	Poaceae
12346	Paspalum vaginatum	Poaceae
12346	Scirpus lacustris	Cyperaceae
12348	Agrostis stolonifera	Poaceae
12348	Paspalum vaginatum	Poaceae
12348	Scirpus lacustris	Cyperaceae
12347	Phragmites australis	Poaceae
12347	Paspalum vaginatum	Adoxaceae
...	...	...

I hope it is clear. Thanks for your help!

Solution

Example Code

import pandas as pd
data1 = {'RELEVE_NR': [12345, 12346, 12346, 12347],
         'Longitude': [42.23, 36.85, 36.85, 29.34],
         'Latitude': [23.43, 39.25, 39.25, 13.45]}
data2 =  {'RELEVE_NR': [12345, 12346, 12346, 12346, 12347, 12347], 
          'Species': ['Sorbus aucuparia', 'Agrostis stolonifera', 'Paspalum vaginatum', 'Scirpus lacustris', 'Phragmites australis', 'Paspalum vaginatum'], 
          'Family': ['Rosaceae', 'Poaceae', 'Poaceae', 'Cyperaceae', 'Poaceae', 'Adoxaceae']}

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

For convenience, I will set df1 and df2.

Code

Step1

If you take the cumulative sum of the duplicated part and add it to the maximum value, the duplicated number will not be assigned.

cond1 = df1['RELEVE_NR'].duplicated()
s1 = df1['RELEVE_NR'].mask(cond1, cond1.cumsum().add(df1['RELEVE_NR'].max()))
df1.assign(RELEVE_NR=s1)

output:

RELEVE_NR   Longitude   Latitude
0   12345   42.23       23.43
1   12346   36.85       39.25
2   12348   36.85       39.25
3   12347   29.34       13.45

Step2

First, concatenate df1['RELEVE_NR'] and s1 horizontally to create a single DataFrame (df3). Then, merge df2 with df3. The remaining code is minor code to create an accurate output.

df3 = pd.concat([df1['RELEVE_NR'], s1], axis=1).set_axis(['RELEVE_NR', 'key'], axis=1)
df2.merge(df3, how='left')\
   .assign(RELEVE_NR=lambda x: x.pop('key'))\
   .sort_values('RELEVE_NR', key=lambda x: x.map(pd.Series(s1.index, index=s1.values)))\
   .reset_index(drop=True)

output:

RELEVE_NR   Species                 Family
0   12345   Sorbus aucuparia        Rosaceae
1   12346   Agrostis stolonifera    Poaceae
2   12346   Paspalum vaginatum      Poaceae
3   12346   Scirpus lacustris       Cyperaceae
4   12348   Agrostis stolonifera    Poaceae
5   12348   Paspalum vaginatum      Poaceae
6   12348   Scirpus lacustris       Cyperaceae
7   12347   Phragmites australis    Poaceae
8   12347   Paspalum vaginatum      Adoxaceae