Search code examples
pythonpandasdataframeleft-join

(Pandas) How to replace certain values of a column from a different dataset but leave other values in the dataset untouched?


Let's say I have the dataset:

df1 = pd.DataFrame()
df1['number'] = [0,0,0,0,0]
df1["decade"] = ["1970", "1980", "1990", "2000", "2010"]`

print(df1)

#output:
   number decade
0       0   1970
1       0   1980
2       0   1990
3       0   2000
4       0   2010

and I want to merge it with another dataset:

df2 = pd.DataFrame()
df2['number'] = [1,1]
df2["decade"] = ["1990", "2010"]

print(df2)

#output:
   number decade
0       1   1990
1       1   2010

such that it get's values only from the decades from df2 that have values in them and leaves the others untouched, yielding:

   number decade
0       0   1970
1       0   1980
2       1   1990
3       0   2000
4       1   2010

how must one go about doing that in pandas? I've tried stuff like join, merge, and concat but they all seem to either not give the desired result or not work because of the different dimensions of the 2 datasets. Any suggestions regarding which function I should be looking at?

Thank you so much!


Solution

  • You can use pandas.DataFrame.merge with pandas.Series.fillna :

    out = (
            df1[["decade"]]
                .merge(df2, on="decade", how="left")
                .fillna({"number": df1["number"]}, downcast="infer")
          )
    

    # Output :

    print(out)
    
      decade  number
    0   1970       0
    1   1980       0
    2   1990       1
    3   2000       0
    4   2010       1