sample first row of event log file ,here i have successfully extracted evrything apart from last key value pair which is attribute-
{"event_type":"ActionClicked","event_timestamp":1451583172592,"arrival_timestamp":1451608731845,"event_version":"3.0",
"application":{"app_id":"7ffa58dab3c646cea642e961ff8a8070","cognito_identity_pool_id":"us-east-1:
4d9cf803-0487-44ec-be27-1e160d15df74","package_name":"com.think.vito","sdk":{"name":"aws-sdk-android","version":"2.2.2"}
,"title":"Vito","version_name":"1.0.2.1","version_code":"3"},"client":{"client_id":"438b152e-5b7c-4e99-9216-831fc15b0c07",
"cognito_id":"us-east-1:448efb89-f382-4975-a1a1-dd8a79e1dd0c"},"device":{"locale":{"code":"en_GB","country":"GB",
"language":"en"},"make":"samsung","model":"GT-S5312","platform":{"name":"ANDROID","version":"4.1.2"}},
"session":{"session_id":"c15b0c07-20151231-173052586","start_timestamp":1451583052586},"attributes":{"OfferID":"20186",
"Category":"40000","CustomerID":"304"},"metrics":{}}
Hello Every One ,I am trying to extract the content from Event log file as shown in attached image .As to requirement i have to fetch customer ID
, offer id
, category
these are important variable i need to extract from the this event log file .this is csv formatted file. i tryed with regular expression but it is't working because you can observe format of every column is different. As you see first row has category
customer id
offer id
and second row is totally blank in this case regular expression wont work apart from this we have to consider we have to consider all possible condition, we has 14000 sample.in Event log file ...#Jason # Parsing #Python #Pandas
This might not be the most efficient way to convert nested json records in a text file (delimited by line) to DataFrame object, but it kinda does the job.
import pandas as pd
import json
from pandas.io.json import json_normalize
with open('path_to_your_text_file.txt', 'rb') as f:
data = f.readlines()
data = map(lambda x: eval(json_normalize(json.loads(x.rstrip())).to_json(orient="records")[1:-1]), data)
e = pd.DataFrame(data)
print e.head()