My JSON looks like this (but with many lines like these):
{"text": "Home - Homepage des Kunstvereins Pro Ars Lausitz e.V.\nKunst. Und so weiter.", "timestamp": "2018-01-20T18:56:35Z", "url": "http://proarslausitz.de/1.html"}
{"text": "Bildnummer: 79800031\nVektorgrafikSkalieren Sie ohne Aufl\u00f6sungsverlust auf jede beliebige. Ende.", "url": "http://www.shutterstock.com/de/pic.mhtml?id=79800031&src=lznayUu4-IHg9bkDAflIhg-1-15"}
I want to create a .txt
file containing just the text from text
. So it would be just:
Home - Homepage des Kunstvereins Pro Ars Lausitz e.V.\nKunst. Und so weiter. Bildnummer: 79800031\nVektorgrafikSkalieren Sie ohne Aufl\u00f6sungsverlust auf jede beliebige. Ende.
No strings, no nothing. The encoding (because of umlauts) I think is not hard to solve afterwards. But regarding text extraction, I know I can do:
json_object = json.loads(json_object_string)
print(json_object["text"])
But that's just for a line. Do I need to iterate over the lines? How can I merge the texts into a single .txt
file?
with open("file.txt", 'w') as txt_file:
for i in range(len(js_file['...'])):
txt_file.write(js['...'][i]['text'])
txt_file.close()
replace '...' with the name of the main key for the json file