Search code examples
javaregexguava

Guava Splitter to key value map with splitter character included in strings


I am trying to parse a log file by using Guava splitter. Log file looks like this:

appName=XXX clientIp=X.X.X timestamp="2017-06-05T13:22:12-07:00" request="POST /forward HTTP/1.1" statusCode=204 bytesOut=1167 totalTime=0.062 bytesIn=1289 sourceHost=XXXX connId=49936598 connReqs=9 upInstance=XXX:104:XXX-XXX:8664:17F34 upConnectSec=0.052 upAddr="XX.XX.XX:123" upHost="vcv08it-cvcv2801:8464" upHdrTimeSec=0.058 upRespTimeSec=0.058 pid=32561  upStatusCode=204 message="Access Log" corrKey=GMIFCDIKRZR2T4VZQXJA2IT6 upCached=- length=0 partition=XXX location="= /v1/tXXXX" xff="XX.XX.XX.XX" referer="-" user-agent="Apache-HttpAsyncClient/4.1.1 (Java/1.8.0_131)\" rateLimitCurrentValues="--" rateLimitTimeMs=\"-:-"

I used this code to parse it:

Map<String, String> parserMap;
parserMap = Splitter.onPattern("\\s(?=([^\\\"]*\\\"[^\\\"]*\\\")*[^\\\"]*$)")
.omitEmptyStrings()
.withKeyValueSeparator(Splitter.onPattern("="))
.split(line);

My problem is location="= /v1/tXXXX" field which has '=' inside the string and current withKeyValueSeperator can't parse it. Could you please help me how I should change patterns to get all the fields correctly?


Solution

  • Exception java.lang.IllegalArgumentException: Chunk [location="= /v1/tXXXX"] is not a valid entry is thrown from your code because the keyValueSeparator occurs more than once within the chunk. You can adjust your keyValueSeparator so that only equal signs followed by your value pattern are matched. e.g.:

    final String keyPattern = "\\S+";
    final String valuePattern = "(\\S+|\"[^\"]*\")";
    parserMap = Splitter.onPattern("\\s(?=" + keyPattern + "=" + valuePattern + ")")
            .omitEmptyStrings()
            .withKeyValueSeparator(Splitter.onPattern("=(?=" + valuePattern + ")"))
            .split(line);
    

    Note that this won't work if you have something like key="key=value" within your line.