Search code examples
pythonpandasdictionarytwitterlist-comprehension

How to produce a correct bar plot using Pandas and Matplotlib.pyplot, from a list of dictionaries


My problem is that I'm trying to create a bar plot, but it is not outputting correctly.

I have a list of dictionaries.

Each dictionary contains all of the data and attributes associated with thousands of tweets from Twitter. Each dictionary contains attributes as key:value combinations including the tweet content, the screen name of the person tweeting, the language of the tweet, the country of origin of the tweet, and more.

To create my bar plot for the language attribute, I have a list comprehension that attempts to read in the list as a Pandas dataframe and output the data as a bar plot with 5 frequency bars for each of the top 5 most used languages in my list of tweets.

Here is my code for the language bar plot (note that my list of dictionaries containing each tweet is called tweets_data):

tweets_df = pd.DataFrame()

tweets_df['lang'] = map(lambda tweet: tweet['lang'], tweets_data)

tweets_by_lang = tweets_df['lang'].value_counts()

fig, ax = plt.subplots()
ax.tick_params(axis='x', labelsize=15)
ax.tick_params(axis='y', labelsize=10)
ax.set_xlabel('Languages', fontsize=15)
ax.set_ylabel('Number of tweets' , fontsize=15)
ax.set_title('Top 5 languages', fontsize=15, fontweight='bold')
tweets_by_lang[:5].plot(ax=ax, kind='bar', color='red')

As I said, I should be getting 5 bars, one for each of the top five languages in my data. Instead, I am getting the graph show below.enter image description here


Solution

  • Your problem is here:

    tweets_df['lang'] = map(lambda tweet: tweet['lang'], tweets_data)
    

    The issue, as your comment suggests, is down to changes from Python 2 to 3. In Python 2, map() returns a list. But in Python 3, map() returns an iterator. The hint is that there's only one value of tweets_df['lang'].value_counts() and it's the <map ... > iterator object).

    In either Python 2 or 3, you can use a list comprehension instead:

    tweet_by_lang = pd.Series([tweet['lang'] for tweet in tweets_data]).value_counts()
    

    Or in Python 3, you can follow @Triptych's advice from the answer linked above and wrap map() in a list():

    tweets_df['lang'] = list(map(lambda tweet: tweet['lang'], tweets_data))