Search code examples
pythonpandasdataframedata-cleaning

Dropping Rows that Contain a Specific String wrapped in square brackets?


I'm trying to drop rows which contain strings that are wrapped in a column. I want to drop all values that contain the strings '[removed]', '[deleted]'. My df looks like this:

  Comments

1 The main thing is the price appreciation of the token (this determines the gains or losses more 
  than anything). Followed by the ecosystem for the liquid staking asset, the more opportunities 
  and protocols that accept the asset as collateral, the better. Finally, the yield for staking 
  comes into play.

2 [deleted]

3 [removed]

4 I could be totally wrong, but sounds like destroying an asset and claiming a loss, which I 
  believe is fraudulent. Like someone else said, get a tax guy - for this year anyway and then 
  you'll know for sure. Peace of mind has value too.

I have tried df[df["Comments"].str.contains("removed")==False] But when i try to save the dataframe, it is still not removed.

EDIT: My full code

import pandas as pd
sol2020 = pd.read_csv("Solana_2020_Comments_Time_Adjusted.csv")
sol2021 = pd.read_csv("Solana_2021_Comments_Time_Adjusted.csv")
df = pd.concat([sol2021, sol2020], ignore_index=True, sort=False)
df[df["Comments"].str.contains("deleted")==False]
df[df["Comments"].str.contains("removed")==False]

Solution

  • Try this

    I have created a data frame for comments column and used my own comments but it should work for you

    import pandas as pd
    
    sample_data = { 'Comments': ['first comment whatever','[deleted]','[removed]','last comments whatever']}
    
    df = pd.DataFrame(sample_data)
    
    data = df[df["Comments"].str.contains("deleted|removed")==False]
    
    print(data)
    

    output I got

     Comments
    0  first comment whatever
    3  last comments whatever