Search code examples
python-3.xpandaspandas-groupbysklearn-pandas

Unstructured data to find a column count


I have unstructured data from my perf logs. I would like to capture the service details from it. I can do delimiter, however I am not able to count or Print the column, since it doesn't have any header.

Kindly help me to figure out this issue.s

import pandas as pd

df = pd.read_csv (r'/Users/Myhome/Documents/Py_Learning/log.csv', sep = '|' , skipinitialspace=True)
#df = pd.read_csv (r'/Users/Myhome/Documents/Py_Learning/log.csv', sep =':|,|[|]', engine='python', header=None) ---> Multi separator is giving error. 


#df.groupby("CLIENT")
SERVICE = df.columns[4]
print (SERVICE) 

How can I find the unique service name in all lines and get the count. I would like to give it as a graph with last week data.

Sample data :

2019-10-22 15:35|Where:CARD|SERVICE:Dell|VERSION:1.0|CLIENT:HDD|OPERATION:boverdue|RESPONSETIME:0034|STATUS:100000:ERR_TRANSACTION_TIMED_OUT|SEVERITY:ERROR|STATUSCODE:SOAP-FAULT|STATUSMESSAGE:NA 2019-10-22 15:35|Where:Digital|SERVICE:Laptop|VERSION:1.0|CLIENT:mouse|OPERATION:connet|RESPONSETIME:3456|STATUS:NO_RECORDS_MATCH_SELECTION_CRITERIA|SEVERITY:INFO|STATUSCODE:1120|STATUSMESSAGE:NA

Solution

  • I do not know exactely how is your dataset, but you can return unique values by using value_counts reference.

    df_unique = (df['SERVICE'].value_counts()
                  .rename_axis('service')
                  .reset_index(name='COUNT'))