Search code examples
javahivewebsphereapache-pig

Parsing Java Server Exception logs


I have a requirement wherein I have to read all "history" exception logs from a WebSphere server and load them in Hive. Below is what a typical log looks like but message rows are sometimes extended for 4-5 lines as well. I do not really care about the stack trace but definitely need the Timestamp, ThereadId, Short name, Event Type and full error message in their individual columns.

[5/20/16 22:35:39:841 CDT] 00233723 SystemOut     O 22:35:39,840 ERROR [com.xxx.app.yyy.hms.jms.receivers.impl.B2bTonnn278InReceiverImpl] 
xxxRuntimeException{errorVO=com.xxx.app.yyy.nnn.mmm.data.mmmCompleteIntakeErrorVO(diagnosesMessagesExist:false, mmmMessagesExist:false, incrementedKey:null, numPagesWithMessages:1, primaryKeyFields:[], providersMessagesExist:false, requiredFields:[], servicesMessagesExist:true, changeDateTime:05-20-2016 10:35:39:840 PM CDT, changeUserID:SYSTEM, createDateTime:null, createUserID:null, dataSecured:false, dataSecurityTypeList:null, globalMessages:[], historyID:0, messages:{procedureUnitCount=[Field For Label: procedureUnitCount Message ID: 'ERR0010', Message Arguments: '[]']}, trackChanges:false, updateVersion:-1, messages={procedureUnitCount=[Field For Label: procedureUnitCount Message ID: 'ERR0010', Message Arguments: '[]']})}
    at com.xxx.app.yyy.nnn.mmm.businesslogic.impl.mmmImpl.completemmm(mmmImpl.groovy:612)
    at sun.reflect.GeneratedMethodAccessor4988.invoke(Unknown Source)

I tried doing this by reading one line at a time and parsing using Regex - which failed miserably (only 20% of data met the Regex) and that quality is also poor. I really do not know to proceed here and what delimiter to choose to break that exception string to columns (\t already tried - not working too.)

Any help or pointer to right direction here ?


Solution

  • Use Logstash to read and parse the WebSphere logs and post them into Elasticsearch for further processing (i.e use ELK Stack).

    Read related discussion here.

    With Logstash, you can use Grok to parse any crappy unstructured log data into something structured and queryable.