Search code examples
python-3.xpandasdataframetwitter

Twitter data analysis - Extract handles and hashtags


I have a df with Tweets and Date. I'd like to create a function which extracts the @ handles and converts them according to a dictionary (eg. '@CityofCTAlerts' : 'Cape Town') and puts the dictionary value (IE Cape Town) into a new column. I then need to extract all hashtags into a separate column.

This is what I've tried:

def extract_municipality_hashtags(df):
    twitter_df['municipality'] = twitter_df['Tweets'].map(lambda x: (i[1:] for i in x.split() if i.startswith('@')))
    twitter_df['municipality'] = twitter_df['municipality'].map(mun_dict)
    twitter_df['hashtags']=twitter_df['Tweets'].str.findall(r'#.*?(?=\s|$)')
    return df

I then run the function:

extract_municipality_hashtags(twitter_df.copy())

But get "bound method NDFrame.copy of"

I need it to return a proper dataframe

Here is the original dataframe:

twitter_url = 'https://raw.githubusercontent.com/Explore-AI/Public-Data/master/Data/twitter_nov_2019.csv'
twitter_df = pd.read_csv(twitter_url)
twitter_df.head()

Dictionary used:

mun_dict = {
    '@CityofCTAlerts' : 'Cape Town',
    '@CityPowerJhb' : 'Johannesburg',
    '@eThekwiniM' : 'eThekwini' ,
    '@EMMInfo' : 'Ekurhuleni',
    '@centlecutility' : 'Mangaung',
    '@NMBmunicipality' : 'Nelson Mandela Bay',
    '@CityTshwane' : 'Tshwane'
}

Solution

  • Ok looks like I eventually figured it out:

    def extract_municipality_hashtags(df):
        twitter_df['municipality'] = twitter_df['Tweets'].map(lambda x: (i[1:] for i in x.split() if i.startswith('@')))
        twitter_df['municipality'] = twitter_df['municipality'].map(mun_dict)
        twitter_df['hashtags']=twitter_df['Tweets'].str.lower().str.findall(r'#.*?(?=\s|$)')
        fun_4 = twitter_df[['Tweets','Date','municipality','hashtags']]
        fun_4['hashtags'] = fun_4['hashtags'].apply(lambda y: np.nan if len(y)==0 else y)
        return fun_4