Search code examples
pythonpandasgspread

New dataframe carrying changes to original in pandas/gspread script


I am writing a code to read data from google sheets using gspread module.

First I read the spreadsheet and store values in a variable called df. Afterwards, I create a variable called df2 from df to make some transformations (string to numeric), while keeping df (the original database intact ). However this transformation made in df2 is carried to df (original variable where I store the original database). This should not behave like that, the change sould occur only in df2.

Does anyone know why this is happening?

Pls see the code below:

import gspread
import pandas as pd

sa = gspread.service_account(filename = "keys.json") 
sheet = sa.open("chupacabra") 
worksheet = sheet.worksheet("vaca_loca")

df = pd.DataFrame(worksheet.get("B2:I101"))

df

[df loaded](https://i.sstatic.net/lV3GJ.png)

df2 = df

df2["Taxa"] = df2["Taxa"].str.replace(",",".")
df2["Taxa"] = df2["Taxa"].str.replace("%","")
df2["Taxa"] = pd.to_numeric(df2["Taxa"])
df2["Taxa"] = df2["Taxa"]/100

df2

[df2 after string transformation](https://i.sstatic.net/cFWOg.png)

df 

[df carrying the transformation changes made in df2](https://i.sstatic.net/KsSsa.png)

I was trying to perform only transformation in df2, while df should remain intact.


Solution

  • In your script, I'm worried that the reason for your issue might be due to the call by reference. If my understanding is correct, how about the following modification?

    From:

    df2 = df
    

    To:

    df2 = df.copy()
    
    • By this modification, df is copied as the pass-by-value.