Search code examples
pythonlisttwittersplit

Splitting a list of twitter data


I have a file full of hundreds of un-separated tweets all formatted like so:

{"text": "Just posted a photo @ Navarre Conference Center", "created_at": "Sun  Nov 13 01:52:03 +0000 2016", "coordinates": [-86.8586,  30.40299]}

I am trying to split them up so I can assign each part to a variable.

  1. The text

  2. The timestamp

  3. The location coordinates

I was able to split the tweets up using .split('{}') but I don't really know how to split the rest into the three things that I want.

My basic idea that didn't work:

file = open('tweets_with_time.json' , 'r')
line = file.readline()

    for line in file:


        line = line.split(',')

        message = (line[0])
        timestamp = (line[1])
        position = (line[2])

        #just to test if it's working
        print(position)

Thanks!


Solution

  • I just downloaded your file, it's not as bad as you said. Each tweet is on a separate line. It would be nicer if the file was a JSON list, but we can still parse it fairly easily, line by line. Here's an example that extracts the 1st 10 tweets.

    import json
    
    fname = 'tweets_with_time.json'
    with open(fname) as f:
        for i, line in enumerate(f, 1):
            # Convert this JSON line into a Python dict
            data = json.loads(line)
    
            # Extract the data
            message = data['text']
            timestamp = data['created_at']
            position = data['coordinates']
    
            # Print it
            print(i)
            print('Message:', message)
            print('Timestamp:', timestamp)
            print('Position:', position)
            print()
    
            #Only print the first 10 tweets
            if i == 10:
                break
    

    Unfortunately, I can't show the output of this script: Stack Exchange won't allow me to put those shortened URLs into a post.


    Here's a modified version that cuts off each message at the URL.

    import json
    
    fname = 'tweets_with_time.json'
    with open(fname) as f:
        for i, line in enumerate(f, 1):
            # Convert this JSON line to a Python dict
            data = json.loads(line)
    
            # Extract the data
            message = data['text']
            timestamp = data['created_at']
            position = data['coordinates']
    
            # Remove the URL from the message
            idx = message.find('https://')
            if idx != -1:
                message = message[:idx]
    
            # Print it
            print(i)
            print('Message:', message)
            print('Timestamp:', timestamp)
            print('Position:', position)
            print()
    
            #Only print the first 10 tweets
            if i == 10:
                break
    

    output

    1
    Message: Just posted a photo @ Navarre Conference Center 
    Timestamp: Sun Nov 13 01:52:03 +0000 2016
    Position: [-86.8586, 30.40299]
    
    2
    Message: I don't usually drink #coffee, but I do love a good #Vietnamese drip coffee with condense milk… 
    Timestamp: Sun Nov 13 01:52:04 +0000 2016
    Position: [-123.04437109, 49.26211779]
    
    3
    Message: #bestcurryπŸ’₯πŸ‘£πŸ‘ŒπŸ½πŸ˜ŽπŸ€‘πŸ‘πŸ½πŸ‘πŸΌπŸ‘ŠπŸΌβ˜πŸ½πŸ™ŒπŸΌπŸ’ͺπŸΌπŸŒ΄πŸŒΊπŸŒžπŸŒŠπŸ·πŸ‰πŸπŸŠπŸΌπŸ„πŸ½πŸ‹πŸ½πŸŒβœˆοΈπŸ’ΈβœπŸ’―πŸ†’πŸ‡ΏπŸ‡¦πŸ‡ΊπŸ‡ΈπŸ™πŸΌ#johanvanaarde #kauai #rugby #surfing… 
    Timestamp: Sun Nov 13 01:52:04 +0000 2016
    Position: [-159.4958861, 22.20321232]
    
    4
    Message: #thatonePerezwedding πŸ’πŸ’ @ Scenic Springs 
    Timestamp: Sun Nov 13 01:52:05 +0000 2016
    Position: [-98.68685568, 29.62182898]
    
    5
    Message: Miami trends now: Heat, Wade, VeteransDay, OneLetterOffBands and TheyMightBeACatfishIf. 
    Timestamp: Sun Nov 13 01:52:05 +0000 2016
    Position: [-80.19240081, 25.78111669]
    
    6
    Message: Thank you family for supporting my efforts. I love you all!… 
    Timestamp: Sun Nov 13 01:52:05 +0000 2016
    Position: [-117.83012, 33.65558157]
    
    7
    Message: If you're looking for work in #HONOLULU, HI, check out this #job: 
    Timestamp: Sun Nov 13 01:52:05 +0000 2016
    Position: [-157.7973653, 21.2868901]
    
    8
    Message: Drinking a L'Brett d'Apricot by @CrookedStave @ FOBAB β€” 
    Timestamp: Sun Nov 13 01:52:05 +0000 2016
    Position: [-87.6455, 41.8671]
    
    9
    Message: Can you recommend anyone for this #job? Barista (US) - 
    Timestamp: Sun Nov 13 01:52:05 +0000 2016
    Position: [-121.9766823, 38.350109]
    
    10
    Message: He makes me happy @ Frank and Bank 
    Timestamp: Sun Nov 13 01:52:05 +0000 2016
    Position: [-75.69360487, 45.41268776]