I have a file full of hundreds of un-separated tweets all formatted like so:
{"text": "Just posted a photo @ Navarre Conference Center", "created_at": "Sun Nov 13 01:52:03 +0000 2016", "coordinates": [-86.8586, 30.40299]}
I am trying to split them up so I can assign each part to a variable.
The text
The timestamp
The location coordinates
I was able to split the tweets up using .split('{}')
but I don't really know how to split the rest into the three things that I want.
My basic idea that didn't work:
file = open('tweets_with_time.json' , 'r')
line = file.readline()
for line in file:
line = line.split(',')
message = (line[0])
timestamp = (line[1])
position = (line[2])
#just to test if it's working
print(position)
Thanks!
I just downloaded your file, it's not as bad as you said. Each tweet is on a separate line. It would be nicer if the file was a JSON list, but we can still parse it fairly easily, line by line. Here's an example that extracts the 1st 10 tweets.
import json
fname = 'tweets_with_time.json'
with open(fname) as f:
for i, line in enumerate(f, 1):
# Convert this JSON line into a Python dict
data = json.loads(line)
# Extract the data
message = data['text']
timestamp = data['created_at']
position = data['coordinates']
# Print it
print(i)
print('Message:', message)
print('Timestamp:', timestamp)
print('Position:', position)
print()
#Only print the first 10 tweets
if i == 10:
break
Unfortunately, I can't show the output of this script: Stack Exchange won't allow me to put those shortened URLs into a post.
Here's a modified version that cuts off each message at the URL.
import json
fname = 'tweets_with_time.json'
with open(fname) as f:
for i, line in enumerate(f, 1):
# Convert this JSON line to a Python dict
data = json.loads(line)
# Extract the data
message = data['text']
timestamp = data['created_at']
position = data['coordinates']
# Remove the URL from the message
idx = message.find('https://')
if idx != -1:
message = message[:idx]
# Print it
print(i)
print('Message:', message)
print('Timestamp:', timestamp)
print('Position:', position)
print()
#Only print the first 10 tweets
if i == 10:
break
output
1
Message: Just posted a photo @ Navarre Conference Center
Timestamp: Sun Nov 13 01:52:03 +0000 2016
Position: [-86.8586, 30.40299]
2
Message: I don't usually drink #coffee, but I do love a good #Vietnamese drip coffee with condense milkβ¦
Timestamp: Sun Nov 13 01:52:04 +0000 2016
Position: [-123.04437109, 49.26211779]
3
Message: #bestcurryπ₯π£ππ½ππ€ππ½ππΌππΌβπ½ππΌπͺπΌπ΄πΊπππ·ππππΌππ½ππ½πβοΈπΈβπ―ππΏπ¦πΊπΈππΌ#johanvanaarde #kauai #rugby #surfingβ¦
Timestamp: Sun Nov 13 01:52:04 +0000 2016
Position: [-159.4958861, 22.20321232]
4
Message: #thatonePerezwedding ππ @ Scenic Springs
Timestamp: Sun Nov 13 01:52:05 +0000 2016
Position: [-98.68685568, 29.62182898]
5
Message: Miami trends now: Heat, Wade, VeteransDay, OneLetterOffBands and TheyMightBeACatfishIf.
Timestamp: Sun Nov 13 01:52:05 +0000 2016
Position: [-80.19240081, 25.78111669]
6
Message: Thank you family for supporting my efforts. I love you all!β¦
Timestamp: Sun Nov 13 01:52:05 +0000 2016
Position: [-117.83012, 33.65558157]
7
Message: If you're looking for work in #HONOLULU, HI, check out this #job:
Timestamp: Sun Nov 13 01:52:05 +0000 2016
Position: [-157.7973653, 21.2868901]
8
Message: Drinking a L'Brett d'Apricot by @CrookedStave @ FOBAB β
Timestamp: Sun Nov 13 01:52:05 +0000 2016
Position: [-87.6455, 41.8671]
9
Message: Can you recommend anyone for this #job? Barista (US) -
Timestamp: Sun Nov 13 01:52:05 +0000 2016
Position: [-121.9766823, 38.350109]
10
Message: He makes me happy @ Frank and Bank
Timestamp: Sun Nov 13 01:52:05 +0000 2016
Position: [-75.69360487, 45.41268776]