Hello i have a large text file containing multiple information. I'd like to extract only e-mail id and phone numbers with a python program or a tool.
HTTP/1.1 200 OK
{"id":"269","first_name":"N S","last_name":"","balance":"0","phonecode":null,"mobile":null,"email":"[email protected]","verified":"0","password":""}
HTTP/1.1 200 OK
{"id":"303","first_name":"Devi","last_name":"Baruah","balance":"0","phonecode":null,"mobile":null,"email":"[email protected]","verified":"0","password":""}
HTTP/1.1 200 OK
{"id":"306","first_name":"Rashmi","last_name":"Kumari","balance":"24","phonecode":"91","mobile":"9xxxxxxx","email":"[email protected]","verified":"1","password":"xxxx"}
HTTP/1.1 200 OK
{"id":"308","first_name":"ashwini","last_name":"gokhale","balance":"7","phonecode":"1","mobile":"61xxxx","email":"[email protected]","verified":"1","password":"xxxxxxx"}
HTTP/1.1 200 OK
{"id":"307","first_name":"Rama","last_name":"De","balance":"0","phonecode":"91","mobile":"73xxxxxx","email":"[email protected]","verified":"1","password":"xxxx"}
Looks like that is a log from a webserver. If possible try have a cleaner file in first,
anyhow:
import json
mandatory_keys = ['email', 'mobile']
file_str = []
out = []
with open('test') as fd:
file_str = [x.rstrip('\n') for x in fd.readlines() if x.startswith('{')]
for j_str in file_str:
try:
j = json.loads(j_str)
assert [x for x in mandatory_keys if x in j.keys()] == mandatory_keys, f'missing mandatory_keys'
out.append({k: v for k, v in j.items() if k in mandatory_keys})
except:
raise ValueError('Something wrong with the json')
print(out)
Also you may want to use some json model validator as 'jsonschema' to substitute the assert line there and have a clear error message.
Changing the mandatory_key list you can easily update you outpu.