I have a df with Tweets and Date. I'd like to create a function which extracts the @ handles and converts them according to a dictionary (eg. '@CityofCTAlerts' : 'Cape Town') and puts the dictionary value (IE Cape Town) into a new column. I then need to extract all hashtags into a separate column.
This is what I've tried:
def extract_municipality_hashtags(df):
twitter_df['municipality'] = twitter_df['Tweets'].map(lambda x: (i[1:] for i in x.split() if i.startswith('@')))
twitter_df['municipality'] = twitter_df['municipality'].map(mun_dict)
twitter_df['hashtags']=twitter_df['Tweets'].str.findall(r'#.*?(?=\s|$)')
return df
I then run the function:
extract_municipality_hashtags(twitter_df.copy())
But get "bound method NDFrame.copy of"
I need it to return a proper dataframe
Here is the original dataframe:
twitter_url = 'https://raw.githubusercontent.com/Explore-AI/Public-Data/master/Data/twitter_nov_2019.csv'
twitter_df = pd.read_csv(twitter_url)
twitter_df.head()
Dictionary used:
mun_dict = {
'@CityofCTAlerts' : 'Cape Town',
'@CityPowerJhb' : 'Johannesburg',
'@eThekwiniM' : 'eThekwini' ,
'@EMMInfo' : 'Ekurhuleni',
'@centlecutility' : 'Mangaung',
'@NMBmunicipality' : 'Nelson Mandela Bay',
'@CityTshwane' : 'Tshwane'
}
Ok looks like I eventually figured it out:
def extract_municipality_hashtags(df):
twitter_df['municipality'] = twitter_df['Tweets'].map(lambda x: (i[1:] for i in x.split() if i.startswith('@')))
twitter_df['municipality'] = twitter_df['municipality'].map(mun_dict)
twitter_df['hashtags']=twitter_df['Tweets'].str.lower().str.findall(r'#.*?(?=\s|$)')
fun_4 = twitter_df[['Tweets','Date','municipality','hashtags']]
fun_4['hashtags'] = fun_4['hashtags'].apply(lambda y: np.nan if len(y)==0 else y)
return fun_4