Search code examples
pythonmysqlcsvtwitter

How to organize Twitter data of CSV in PhpMyAdmin


I'm working on an application where I need to store a collection of tweets along with attributes such as Tweet ID, Date of Tweet, Language and Username inside of a MySQL database.

This is an image of what I'm aiming for: https://i.sstatic.net/VWpyx.png

To do this, I created a program in python that collects 100+ tweets on Twitter in a JSON file. I then converted the JSON file to a CSV file using Microsoft Excel. After this I imported the CSV file in PHPMyAdmin as a table and I got the following outcome: https://i.sstatic.net/V11mK.png (10 rows x 185 columns).

The problem with the above is that some tweets have more data such as media, this causes the data to expand over multiple columns.

How do I quickly clean this table so that I only have my desired attributes in the table? Do I need to go back to scratch and work from my Python code or can I clean from the Table/CSV file?


Solution

  • If Tweets are parsed in JSON format and you need only some of the fields, I recommend you to use JSON module to parse the needed fields and Pandas module to convert them into structured table in order to write it to MySQL, for example:

    import json
    import pandas as pd
    
    #Open and read the text file where all the Tweets are
    with open('tweets.txt') as f:
        tweets = f.readlines()
    #Convert the read Tweets into JSON object
    tweets_json = [json.loads(tweet) for tweet in tweets]
    #Convert the list of Tweets into a structured dataframe
    df = pd.DataFrame(tweets_json)
    #Finally choose the attributes you need
    df = df[['created_at', 'id', ...]]
    #To write table into MySQL
    df.to_sql(...)