Search code examples
javaarraysregexhashmapregex-group

Regex for extracting KEY=VALUE pairs from a log string in java


I have a log string like this :

String s0 = "DC696,\"/xi/ajax/remoting/call/plaincall/adhocReportBuilderControllerProxy.getRortList.dwr\",\"2222-11-10 08:32:22,351               PLV=REQ CIP=9.9.9.7 CMID=syairp CMN=\"\"Dub Airport Corporation Limited\"\" SN=sfv4_APM180885. DPN=dbPool66HFT01 UID=3862D04108 UN=91F6025D47F01D IUID=1931 LOC=en_GB EID=\"\"EVENT-UNKNOWN-UNKNOWN-ob55abe0118-201110083217-396080\"\" AGN=\"\"[Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36 Edg/107.0.1418.35]\"\" RID=REQ-[7274545]  MTD=POST URL=\"\"/xi/ajax/remoting/call/plaincall/adhocRrtBuilderCoollerProxy.getRtList.dwr\"\" RQT=2835 MID=ADIN PID=ADMIN PQ=ADIN_PAGE SUB=0 MEM=2331036 CPU=2410 UCPU=2300 SCPU=110 FRE=10 FWR=0 NRE=2281 NWR=218 SQLC=43 SQLT=142 RPS=200 SID=60826A3FAB005A8A9B930177C5******.pc6bc1029 GID=e262dde6d0e040070b58afd4c8 HSID=ddc665538db779508d3213c0bb63bcb1c49fe8236d5f0884ae975915728e61 CSL=CRITICAL CCON=0 CSUP=0 CLOC=0 CEXT=0 CREM=0 STK={\"\"n\"\":\"\"/xi/ajax/remoting/call/plaincall/adhocReportBuilderControllerProxy.getrtList.dwr\"\",\"\"i\"\":1,\"\"t\"\":2835,\"\"slft\"\":2679,\"\"sub\"\":[{\"\"n\"\":\"\"SQL:select * from sfv4_HOUA180885.REPORT_DEF WHERE REPORT_DEF_ID IN (SELECT REPORT_DEF_ID FROM sfv4_HA80885.REPORT_DTASET WHERE REPORT_ID=?) AND DELETED=? ORDER BY REPORT_DEF_ID asc NULLS LAST\"\",\"\"i\"\":17,\"\"t\"\":40,\"\"slft\"\":40,\"\"st\"\":337,\"\"m\"\":220958,\"\"nr\"\":154,\"\"rt\"\":0,\"\"rn\"\":22,\"\"fs\"\":0}]}   \",\"2022-11-09T21:32:22.351+0000\",p66cf1029,\"dc606_ss_application\",1,\"/app/tomcat/logs/pef.log\",\"perf_log_yxx\",swsskix13";

I want to extract the KEY=VALUE pairs like {PLV=REQ, CIP=9.9.9.7,CMN="Dub Airport Corporation Limited", STK={...} }. into a Map<String,String>

I attempted with this, which does not work

String[] str1= str.split("\\s(?=(([^\"]*\"))*[^\"]*$)\\s*");
System.out.println("Value of split string is "+ Arrays.toString(str1));

Any inputs will be of great help please.


Solution

  • You can use this solution:

    String s0 = "DC696,\"/xi/ajax/remoting/call/plaincall/adhocReportBuilderControllerProxy.getRortList.dwr\",\"2222-11-10 08:32:22,351               PLV=REQ CIP=9.9.9.7 CMID=syairp CMN=\"\"Dub Airport Corporation Limited\"\" SN=sfv4_APM180885. DPN=dbPool66HFT01 UID=3862D04108 UN=91F6025D47F01D IUID=1931 LOC=en_GB EID=\"\"EVENT-UNKNOWN-UNKNOWN-ob55abe0118-201110083217-396080\"\" AGN=\"\"[Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36 Edg/107.0.1418.35]\"\" RID=REQ-[7274545]  MTD=POST URL=\"\"/xi/ajax/remoting/call/plaincall/adhocRrtBuilderCoollerProxy.getRtList.dwr\"\" RQT=2835 MID=ADIN PID=ADMIN PQ=ADIN_PAGE SUB=0 MEM=2331036 CPU=2410 UCPU=2300 SCPU=110 FRE=10 FWR=0 NRE=2281 NWR=218 SQLC=43 SQLT=142 RPS=200 SID=60826A3FAB005A8A9B930177C5******.pc6bc1029 GID=e262dde6d0e040070b58afd4c8 HSID=ddc665538db779508d3213c0bb63bcb1c49fe8236d5f0884ae975915728e61 CSL=CRITICAL CCON=0 CSUP=0 CLOC=0 CEXT=0 CREM=0 STK={\"\"n\"\":\"\"/xi/ajax/remoting/call/plaincall/adhocReportBuilderControllerProxy.getrtList.dwr\"\",\"\"i\"\":1,\"\"t\"\":2835,\"\"slft\"\":2679,\"\"sub\"\":[{\"\"n\"\":\"\"SQL:select * from sfv4_HOUA180885.REPORT_DEF WHERE REPORT_DEF_ID IN (SELECT REPORT_DEF_ID FROM sfv4_HA80885.REPORT_DTASET WHERE REPORT_ID=?) AND DELETED=? ORDER BY REPORT_DEF_ID asc NULLS LAST\"\",\"\"i\"\":17,\"\"t\"\":40,\"\"slft\"\":40,\"\"st\"\":337,\"\"m\"\":220958,\"\"nr\"\":154,\"\"rt\"\":0,\"\"rn\"\":22,\"\"fs\"\":0}]}   \",\"2022-11-09T21:32:22.351+0000\",p66cf1029,\"dc606_ss_application\",1,\"/app/tomcat/logs/pef.log\",\"perf_log_yxx\",swsskix13";
            String regex = "(\\w+)=((?=\\{)(?:(?=.*?\\{(?!.*?\\3)(.*\\}(?!.*\\4).*))(?=.*?\\}(?!.*?\\4)(.*)).)+?.*?(?=\\3)[^{]*(?=\\4$)|\"{2}(.*?)\"{2}|(\\S+))";
    Pattern p = Pattern.compile(regex);
    Matcher m = p.matcher(s0);
    Map<String, String> res = new HashMap<String, String>();
    while(m.find()) {
        String val = m.group(2);
        if (m.group(5) != null) {
            val = m.group(5);
        }
        if (m.group(6) != null) {
            val = m.group(6);
        }
        res.put(m.group(1), val);
        System.out.println(m.group(1) + " => " + val + "\n----");
    }
    

    Output:

    PLV => REQ
    ----
    CIP => 9.9.9.7
    ----
    CMID => syairp
    ----
    CMN => Dub Airport Corporation Limited
    ----
    SN => sfv4_APM180885.
    ----
    DPN => dbPool66HFT01
    ----
    UID => 3862D04108
    ----
    UN => 91F6025D47F01D
    ----
    IUID => 1931
    ----
    LOC => en_GB
    ----
    EID => EVENT-UNKNOWN-UNKNOWN-ob55abe0118-201110083217-396080
    ----
    AGN => [Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36 Edg/107.0.1418.35]
    ----
    RID => REQ-[7274545]
    ----
    MTD => POST
    ----
    URL => /xi/ajax/remoting/call/plaincall/adhocRrtBuilderCoollerProxy.getRtList.dwr
    ----
    RQT => 2835
    ----
    MID => ADIN
    ----
    PID => ADMIN
    ----
    PQ => ADIN_PAGE
    ----
    SUB => 0
    ----
    MEM => 2331036
    ----
    CPU => 2410
    ----
    UCPU => 2300
    ----
    SCPU => 110
    ----
    FRE => 10
    ----
    FWR => 0
    ----
    NRE => 2281
    ----
    NWR => 218
    ----
    SQLC => 43
    ----
    SQLT => 142
    ----
    RPS => 200
    ----
    SID => 60826A3FAB005A8A9B930177C5******.pc6bc1029
    ----
    GID => e262dde6d0e040070b58afd4c8
    ----
    HSID => ddc665538db779508d3213c0bb63bcb1c49fe8236d5f0884ae975915728e61
    ----
    CSL => CRITICAL
    ----
    CCON => 0
    ----
    CSUP => 0
    ----
    CLOC => 0
    ----
    CEXT => 0
    ----
    CREM => 0
    ----
    STK => {""n"":""/xi/ajax/remoting/call/plaincall/adhocReportBuilderControllerProxy.getrtList.dwr"",""i"":1,""t"":2835,""slft"":2679,""sub"":[{""n"":""SQL:select * from sfv4_HOUA180885.REPORT_DEF WHERE REPORT_DEF_ID IN (SELECT REPORT_DEF_ID FROM sfv4_HA80885.REPORT_DTASET WHERE REPORT_ID=?) AND DELETED=? ORDER BY REPORT_DEF_ID asc NULLS LAST"",""i"":17,""t"":40,""slft"":40,""st"":337,""m"":220958,""nr"":154,""rt"":0,""rn"":22,""fs"":0}]}
    ----
    

    See the regex demo.

    Regex details:

    • (\w+) - Group 1: one or more word chars
    • = - a = char
    • ((?=\{)(?:(?=.*?\{(?!.*?\3)(.*\}(?!.*\4).*))(?=.*?\}(?!.*?\4)(.*)).)+?.*?(?=\3)[^{]*(?=\4$)|\"{2}(.*?)\"{2}|(\S+)) - Group 2: