skipping Attribute error while importing twitter data into pandas

I have almost 1 gb file storing almost .2 mln tweets. And, the huge size of file obviously carries some errors. The errors are shown as AttributeError: 'int' object has no attribute 'items'. This occurs when I try to run this code.

 raw_data_path = input("Enter the path for raw data file: ")
 tweet_data_path = raw_data_path



 tweet_data = []
 tweets_file = open(tweet_data_path, "r", encoding="utf-8")
 for line in tweets_file:
   try:
    tweet = json.loads(line)
    tweet_data.append(tweet)
   except:
    continue


    tweet_data2 = [tweet for tweet in tweet_data if isinstance(tweet, 
   dict)]



   from pandas.io.json import json_normalize    
tweets = json_normalize(tweet_data2)[["text", "lang", "place.country",
                                     "created_at", "coordinates", 
                                     "user.location", "id"]]

Can a solution be found where those lines where such error occurs can be skipped and continue for the rest of the lines.

Solution

The issue here is not with lines in data but with tweet_data itself. If you check your tweet_data, you will find one more elements which are of 'int' datatype (assuming your tweet_data is a list of dictionaries as it only expects "dict or list of dicts").

You may want to check your tweet data to remove values other that dictionaries.

I was able to reproduce with below example for json_normalize document:

Working Example:

from pandas.io.json import json_normalize
data = [{'state': 'Florida',
         'shortname': 'FL',
         'info': {
              'governor': 'Rick Scott'
         },
         'counties': [{'name': 'Dade', 'population': 12345},
                     {'name': 'Broward', 'population': 40000},
                     {'name': 'Palm Beach', 'population': 60000}]},
        {'state': 'Ohio',
         'shortname': 'OH',
         'info': {
              'governor': 'John Kasich'
         },
         'counties': [{'name': 'Summit', 'population': 1234},
                      {'name': 'Cuyahoga', 'population': 1337}]},
       ]
json_normalize(data)

Output:

Displays datarame

Reproducing Error:

from pandas.io.json import json_normalize
data = [{'state': 'Florida',
         'shortname': 'FL',
         'info': {
              'governor': 'Rick Scott'
         },
         'counties': [{'name': 'Dade', 'population': 12345},
                     {'name': 'Broward', 'population': 40000},
                     {'name': 'Palm Beach', 'population': 60000}]},
        {'state': 'Ohio',
         'shortname': 'OH',
         'info': {
              'governor': 'John Kasich'
         },
         'counties': [{'name': 'Summit', 'population': 1234},
                      {'name': 'Cuyahoga', 'population': 1337}]},
       1  # *Added an integer to the list*
       ]
result = json_normalize(data)

Error:

AttributeError: 'int' object has no attribute 'items'

How to prune "tweet_data": Not needed, if you follow update below

Before normalising, run below:

tweet_data = [tweet for tweet in tweet_data if isinstance(tweet, dict)]

Update: (for foor loop)

for line in tweets_file:
    try:
        tweet = json.loads(line)
        if isinstance(tweet, dict): 
            tweet_data.append(tweet)
    except:
        continue