Search code examples
javaregexstringsubstringphrase

Retrieve string given search criteria. Brackets, quotations and more


I have the following string:

NoticeText:
    NoticeType [str] = USER_TYPING_ON
    Text [str] = "user is typing"
    EventInfo:
        PartyId [int] = 2
        EventType [str] = MESSAGE
        UserNickname [str] = "Michael"
        EventId [int] = 4
        Text [str] = "Hey, how are you?"
        MsgCheck [str] = NONE
        TimeOffset [int] = 23
        UserType [str] = AGENT
NoticeText:
    NoticeType [str] = USER_TYPING_ON
    EventInfo:
    PartyId [int] = 1
        EventType [str] = MESSAGE
        UserNickname [str] = "Bob Smith"
        EventId [int] = 6
        Text [str] = "I'm good, how are you?"
        MsgCheck [str] = NONE
        TimeOffset [int] = 28
        UserType [str] = CLIENT
        MessageType [str] = "text"

I need to be able to retrieve the sentence "I'm good, how are you?". I am completely stumped.

I tried to retrieve phrases after "Text [str] =", that gives back what I need. But it also gives back all other sentences after "Text [str] =".

One tip that might help you guys is the PartyId [int], field. 1 corresponds to the client. Which is the person's message that I need.

I just don't know how to narrow it down by that.

Please help!


Solution

  • Description

    ^NoticeText:(?:(?!\nNoticeText:).)*\n\s+EventInfo(?:(?!\nNoticeText:).)*\n\s+Text\s*\[str\]\s*=\s*"([^"]*)"(?:(?!\nNoticeText:).)*\nNoticeText:(?:(?!\nNoticeText:).)*\n\s+EventInfo(?:(?!\nNoticeText:).)*\n\s+Text\s*\[str\]\s*=\s*"([^"]*)"(?:(?!\nNoticeText:).)*

    Regular expression visualization

    ** To see the image better, simply right click the image and select view in new window

    Example

    Live Demo

    https://regex101.com/r/tD6uV9/1

    Sample text

    NoticeText:
        NoticeType [str] = USER_TYPING_ON
        Text [str] = "user is typing"
        EventInfo:
            PartyId [int] = 2
            EventType [str] = MESSAGE
            UserNickname [str] = "Michael"
            EventId [int] = 4
            Text [str] = "Hey, how are you?"
            MsgCheck [str] = NONE
            TimeOffset [int] = 23
            UserType [str] = AGENT
    NoticeText:
        NoticeType [str] = USER_TYPING_ON
        EventInfo:
        PartyId [int] = 1
            EventType [str] = MESSAGE
            UserNickname [str] = "Bob Smith"
            EventId [int] = 6
            Text [str] = "I'm good, how are you?"
            MsgCheck [str] = NONE
            TimeOffset [int] = 28
            UserType [str] = CLIENT
            MessageType [str] = "text"
    

    Sample Matches

    • Capture group 0 gets the both NoticeText blocks
    • Capture group 1 gets the first Text [str] after the EventInfo in the first NoticeText
    • Capture group 2 gets the second Text [str] after PartyID in the second NoticeText
    MATCH 1
    Capture Group 1.    [246-263]   `Hey, how are you?`
    Capture Group 2.    [566-588]   `I'm good, how are you?`
    

    Explanation

    NODE                     EXPLANATION
    ----------------------------------------------------------------------
      ^                        the beginning of the string
    ----------------------------------------------------------------------
      NoticeText:              'NoticeText:'
    ----------------------------------------------------------------------
      (?:                      group, but do not capture (0 or more times
                               (matching the most amount possible)):
    ----------------------------------------------------------------------
        (?!                      look ahead to see if there is not:
    ----------------------------------------------------------------------
          \n                       '\n' (newline)
    ----------------------------------------------------------------------
          NoticeText:              'NoticeText:'
    ----------------------------------------------------------------------
        )                        end of look-ahead
    ----------------------------------------------------------------------
        .                        any character except \n
    ----------------------------------------------------------------------
      )*                       end of grouping
    ----------------------------------------------------------------------
      \n                       '\n' (newline)
    ----------------------------------------------------------------------
      \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                               more times (matching the most amount
                               possible))
    ----------------------------------------------------------------------
      EventInfo                'EventInfo'
    ----------------------------------------------------------------------
      (?:                      group, but do not capture (0 or more times
                               (matching the most amount possible)):
    ----------------------------------------------------------------------
        (?!                      look ahead to see if there is not:
    ----------------------------------------------------------------------
          \n                       '\n' (newline)
    ----------------------------------------------------------------------
          NoticeText:              'NoticeText:'
    ----------------------------------------------------------------------
        )                        end of look-ahead
    ----------------------------------------------------------------------
        .                        any character except \n
    ----------------------------------------------------------------------
      )*                       end of grouping
    ----------------------------------------------------------------------
      \n                       '\n' (newline)
    ----------------------------------------------------------------------
      \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                               more times (matching the most amount
                               possible))
    ----------------------------------------------------------------------
      Text                     'Text'
    ----------------------------------------------------------------------
      \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                               more times (matching the most amount
                               possible))
    ----------------------------------------------------------------------
      \[                       '['
    ----------------------------------------------------------------------
      str                      'str'
    ----------------------------------------------------------------------
      \]                       ']'
    ----------------------------------------------------------------------
      \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                               more times (matching the most amount
                               possible))
    ----------------------------------------------------------------------
      =                        '='
    ----------------------------------------------------------------------
      \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                               more times (matching the most amount
                               possible))
    ----------------------------------------------------------------------
      "                        '"'
    ----------------------------------------------------------------------
      (                        group and capture to \1:
    ----------------------------------------------------------------------
        [^"]*                    any character except: '"' (0 or more
                                 times (matching the most amount
                                 possible))
    ----------------------------------------------------------------------
      )                        end of \1
    ----------------------------------------------------------------------
      "                        '"'
    ----------------------------------------------------------------------
      (?:                      group, but do not capture (0 or more times
                               (matching the most amount possible)):
    ----------------------------------------------------------------------
        (?!                      look ahead to see if there is not:
    ----------------------------------------------------------------------
          \n                       '\n' (newline)
    ----------------------------------------------------------------------
          NoticeText:              'NoticeText:'
    ----------------------------------------------------------------------
        )                        end of look-ahead
    ----------------------------------------------------------------------
        .                        any character except \n
    ----------------------------------------------------------------------
      )*                       end of grouping
    ----------------------------------------------------------------------
      \n                       '\n' (newline)
    ----------------------------------------------------------------------
      NoticeText:              'NoticeText:'
    ----------------------------------------------------------------------
      (?:                      group, but do not capture (0 or more times
                               (matching the most amount possible)):
    ----------------------------------------------------------------------
        (?!                      look ahead to see if there is not:
    ----------------------------------------------------------------------
          \n                       '\n' (newline)
    ----------------------------------------------------------------------
          NoticeText:              'NoticeText:'
    ----------------------------------------------------------------------
        )                        end of look-ahead
    ----------------------------------------------------------------------
        .                        any character except \n
    ----------------------------------------------------------------------
      )*                       end of grouping
    ----------------------------------------------------------------------
      \n                       '\n' (newline)
    ----------------------------------------------------------------------
      \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                               more times (matching the most amount
                               possible))
    ----------------------------------------------------------------------
      EventInfo                'EventInfo'
    ----------------------------------------------------------------------
      (?:                      group, but do not capture (0 or more times
                               (matching the most amount possible)):
    ----------------------------------------------------------------------
        (?!                      look ahead to see if there is not:
    ----------------------------------------------------------------------
          \n                       '\n' (newline)
    ----------------------------------------------------------------------
          NoticeText:              'NoticeText:'
    ----------------------------------------------------------------------
        )                        end of look-ahead
    ----------------------------------------------------------------------
        .                        any character except \n
    ----------------------------------------------------------------------
      )*                       end of grouping
    ----------------------------------------------------------------------
      \n                       '\n' (newline)
    ----------------------------------------------------------------------
      \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                               more times (matching the most amount
                               possible))
    ----------------------------------------------------------------------
      Text                     'Text'
    ----------------------------------------------------------------------
      \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                               more times (matching the most amount
                               possible))
    ----------------------------------------------------------------------
      \[                       '['
    ----------------------------------------------------------------------
      str                      'str'
    ----------------------------------------------------------------------
      \]                       ']'
    ----------------------------------------------------------------------
      \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                               more times (matching the most amount
                               possible))
    ----------------------------------------------------------------------
      =                        '='
    ----------------------------------------------------------------------
      \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                               more times (matching the most amount
                               possible))
    ----------------------------------------------------------------------
      "                        '"'
    ----------------------------------------------------------------------
      (                        group and capture to \2:
    ----------------------------------------------------------------------
        [^"]*                    any character except: '"' (0 or more
                                 times (matching the most amount
                                 possible))
    ----------------------------------------------------------------------
      )                        end of \2
    ----------------------------------------------------------------------
      "                        '"'
    ----------------------------------------------------------------------
      (?:                      group, but do not capture (0 or more times
                               (matching the most amount possible)):
    ----------------------------------------------------------------------
        (?!                      look ahead to see if there is not:
    ----------------------------------------------------------------------
          \n                       '\n' (newline)
    ----------------------------------------------------------------------
          NoticeText:              'NoticeText:'
    ----------------------------------------------------------------------
        )                        end of look-ahead
    ----------------------------------------------------------------------
        .                        any character except \n
    ----------------------------------------------------------------------
      )*                       end of grouping
    ----------------------------------------------------------------------
    

    Alternatively

    If you had a long list of these NoticeText blocks then you could just parse them all with this simplified version of the same expression.

    ^NoticeText:(?:(?!\nNoticeText:)[\s\S])*\n\s+Text\s*\[str\]\s*=\s*"([^"]*)"(?:(?!\nNoticeText:)[\s\S])*

    Regular expression visualization

    With this version I'm using the Global Flag and the Multiline Flag

    Example

    With the same sample text from above Capture Group 0 gets the single NoticeText, and Capture group 1 only gets the last Text [str] value in the block

    Sample Matches

    MATCH 1
    Capture Group 1.    [246-263]   `Hey, how are you?`
    
    MATCH 2
    Capture Group 1.    [566-588]   `I'm good, how are you?`
    

    Live Demo

    https://regex101.com/r/uW6cV6/1