Search code examples
pythonpandasdataframestock

How to drop rows in pandas dataframe, when there is similar values?


I have a python pandas dataframe of stock data, and I'm trying to filter some of those tickers. There are companies that have 2 or more tickers (different types of shares when a share is preferred and the other not). I want to drop the lines of those additional share values, and let just the share with the higher volume. In the dataframe I also have the company name, so maybe there is a way of using it to make some condition and then drop it when comparing the volume of the same company? How can I do this? Dataframe


Solution

  • Use groupby and idxmax:

    Suppose this dataframe:

    >>> df
      ticker  volume
    0  CEBR3     123
    1  CEBR5     456
    2  CEBR6     789  # <- keep for group CEBR
    3  GOAU3      23  # <- keep for group GOAU
    4  GOAU4      12
    5  CMIN3     135  # <- keep for group CMIN3
    
    >>> df.loc[df.groupby(df['ticker'].str.extract(r'^(.*)\d', expand=False), 
                          sort=False)['volume'].idxmax().tolist()]
    
      ticker  volume
    2  CEBR6     789
    3  GOAU3      23
    5  CMIN3     135