Search code examples
javaregexrtf

Extract string content from rtf string java


I've following rtf string: \af31507 \ltrch\fcs0 \insrsid6361256 Study Title: {Test for 14431 process\'27s \u8805 1000 Testing2 14432 \u8805 8000}}{\rtlch\fcs1 \af31507 \ltrch\fcs0 \insrsid12283827 and I want to extract the content of Study Title ie (Study Title: {Test for 14431 process\'27s \u8805 1000 Testing2 14432 \u8805 8000}). Below is my code

String[] arr = value.split("\\s+");
//System.out.println(arr.length);
for(int j=0; j<arr.length; j++) {
    if(isNumeric(arr[j])) {
         arr[j] = "\\?" + arr[j];
    }
}

In above code, I'm splitting the string by space and iterating over the array to check if there is any number in string, however, isNumeric function is unable to process 8000 which is after \u8805 because its getting the content as 8000}}{\rtlch\fcs1. I'm not sure how I can search the Study title and its content using regex?


Solution

  • Study Title: {[^}]*} will match your expect. Demo: https://regex101.com/r/FZl1WL/1

        String s = "{\\af31507 \\ltrch\\fcs0 \\insrsid6361256 Study Title: {Test for 14431 process\\'27s \\u8805 1000 Testing2 14432 \\u8805 8000}}{\\rtlch\\fcs1 \\af31507 \\ltrch\\fcs0 \\insrsid12283827";
        Pattern p = Pattern.compile("Study Title: \\{[^}]*\\}");
        Matcher m = p.matcher(s);
        while (m.find()) {
            System.out.println(m.group());
        }
    

    output:

    Study Title: {Test for 14431 process\'27s \u8805 1000 Testing2 14432 \u8805 8000}
    

    Update as per OP ask

    String s = "{\\af31507 \\ltrch\\fcs0 \\insrsid6361256 Study Title: {Test for 14431 process\\'27s \\u8805 1000 Testing2 14432 \\u8805 8000}}{\\rtlch\\fcs1 \\af31507 \\ltrch\\fcs0 \\insrsid12283827";
        Pattern p = Pattern.compile("(?<=Study Title: \\{)[^}]*(?=\\})");
        Matcher m = p.matcher(s);
        while (m.find()) {
            System.out.println(m.group());
        }
    
    Test for 14431 process\'27s \u8805 1000 Testing2 14432 \u8805 8000