Search code examples
regexsplitapache-cameltokenizespring-camel

Apache Camel Split by start and end characters SOH and ETX


I have an spring boot application which have routes.xml being loaded on startup

On the routes.xml, i have a MQ queue source that contains sample message

SOH{123}{345}{4
5
6
}ETXSOH{111}{222}{3
3
3
}ETX

where SOH = \u0001 and ETX = \u0003

When i receive this message, i want to split the message to two

{123}{345}{4
5
6
}

and

{111}{222}{3
3
3
}

Currently i am trying to split using

<split>
  <tokenize token="(?s)(?&lt;=\u0001)(.*?)(?=\u0003)" regex="true"/>
  <to uri="jms:queue:TEST.OUT.Q" />
</split>

I have tested this regex using online regex tester and it was matching. https://regex101.com/r/fU5VVj/1

But when runnning the code what i am geting is #1

SOH

#2

ETXSOH

#3

ETX

Also tried the token and endToken but not working for my case

<tokenize token="\u0001" endToken="\u0003" />

Is my case possible using camel route xml? If yes, can you point me to correct regex or start and end token.

Thanks


Solution

  • Seems camel regex is different with java regex, just created a new process using sample code below

        Pattern p = Pattern.compile("(?s)(?<=\\u0001).*?(?=\\u0003)");
        Matcher m = p.matcher(items);
        List<String> tokens = new LinkedList<>();
    
        while (m.find()) {
            String token = m.group();
            System.out.println("item = "+token);
            tokens.add(token);
        }