Search code examples
javaregexemailemail-headers

Regex to extract Content-Type


How can extract the lines with the Content-Type info? In some mails, these headers can be in 2 or 3 or even 4 lines, depending how it was sent. This is one example:

Content-Type: text/plain;
    charset="us-ascii"
Content-Transfer-Encoding: 7bit

Lorem ipsum dolor sit amet, consectetur adipisicing elit, 
sed do eiusmod tempor incididunt ut labore et dolore magna 
aliqua. Ut enim ad minim veniam, quis nostrud exercitation 
ullamco laboris nisi ut aliquip ex ea commodo consequat. 
Duis aute irure dolor in reprehenderit in voluptate velit 
esse cillum dolore eu fugiat nulla pariatur. Excepteur sint 
occaecat cupidatat non proident, sunt in culpa qui officia 
deserunt mollit anim id est laborum.

I tried this regex: ^(Content-.*:(.|\n)*)* but it grabs everything.

How should I phrase my regex in Java to get only part:

Content-Type: text/plain;
    charset="us-ascii"
Content-Transfer-Encoding: 7bit

Solution

  • You can try this regex

    Pattern regex = Pattern.compile("Content-Type.*?(?=^\\s*\n?\r?$)", 
                                     Pattern.DOTALL | Pattern.MULTILINE);