Search code examples
pythonpandasdataframeword-count

Two Pandas DataFrames Word Count


I'm working on a personal project where I have Reddit comments from a thread in a subreddit. I now have those comments in a pandas data frame. In a separate data frame, I have a column containing stock ticker symbols. What I have is the following: The first few entries of each of my dataframes

With this in mind, is there a way to use the tickers in ticker_symbols as a dictionary and then output the three most mentioned tickers in comm.body?


Solution

  • You could create a string of all the comments and then use re.findall() to iterate through each symbol and get a count of how many times it appears:

    import re
    
    comments = ', '.join(comm['body'].values)
    ticker_symbols.assign(num=ticker_symbols.apply(lambda x: len(re.findall(x['ACT Symbol'], comments)), axis=1))
    

    If you really just want a list of the top three you can do the following:

    import re
    
    comments = ', '.join(comm['body'].values)
    result = list(
        ticker_symbols
        .assign(num=ticker_symbols.apply(
            lambda x: len(re.findall(x['ACT Symbol'], comments)), 
            axis=1,
        ))
        .sort_values(by='num', ascending=False)[:3]['ACT Symbol'].values
    )