Search code examples
pythonpandastext-parsingstring-parsingtext-extraction

Extracting required Variables from Event Log file using Python


enter image description here

sample first row of event log file ,here i have successfully extracted evrything apart from last key value pair which is attribute-

{"event_type":"ActionClicked","event_timestamp":1451583172592,"arrival_timestamp":1451608731845,"event_version":"3.0",
  "application":{"app_id":"7ffa58dab3c646cea642e961ff8a8070","cognito_identity_pool_id":"us-east-1:
    4d9cf803-0487-44ec-be27-1e160d15df74","package_name":"com.think.vito","sdk":{"name":"aws-sdk-android","version":"2.2.2"}
    ,"title":"Vito","version_name":"1.0.2.1","version_code":"3"},"client":{"client_id":"438b152e-5b7c-4e99-9216-831fc15b0c07",
      "cognito_id":"us-east-1:448efb89-f382-4975-a1a1-dd8a79e1dd0c"},"device":{"locale":{"code":"en_GB","country":"GB",
        "language":"en"},"make":"samsung","model":"GT-S5312","platform":{"name":"ANDROID","version":"4.1.2"}},
  "session":{"session_id":"c15b0c07-20151231-173052586","start_timestamp":1451583052586},"attributes":{"OfferID":"20186",
    "Category":"40000","CustomerID":"304"},"metrics":{}}

Hello Every One ,I am trying to extract the content from Event log file as shown in attached image .As to requirement i have to fetch customer ID, offer id, category these are important variable i need to extract from the this event log file .this is csv formatted file. i tryed with regular expression but it is't working because you can observe format of every column is different. As you see first row has category customer id offer id and second row is totally blank in this case regular expression wont work apart from this we have to consider we have to consider all possible condition, we has 14000 sample.in Event log file ...#Jason # Parsing #Python #Pandas


Solution

  • This might not be the most efficient way to convert nested json records in a text file (delimited by line) to DataFrame object, but it kinda does the job.

    import pandas as pd
    import json
    from pandas.io.json import json_normalize
    
    with open('path_to_your_text_file.txt', 'rb') as f:
        data = f.readlines()
    
    data = map(lambda x: eval(json_normalize(json.loads(x.rstrip())).to_json(orient="records")[1:-1]), data)
    e = pd.DataFrame(data)
    print e.head()