Search code examples
javaandroidregexmultipart

Java regex to extract filename from multipart field not working


I am using the following regex to extract the filename from an rfc822 multipart email.

private static Pattern filenamePattern = Pattern.compile("(?<=filename=\").*?(?=\")");

This is able to extract filenames that have a space, as in:

Content-Type : application/pdf; name="Key.Enrollment_Final.pdf"

but cannot extract filenames that are not quoted, like:

Content-Type : application/octet-stream;    name=.config

I cannot quite figure out how to get both. For the first quote, I think I can check for (?<=filename=\"?), but how should I check for a space or an end of line or a quote?


Solution

  • I have only seen filename attribute being specified in Content-Disposition header, but not Content-Type header.

    Either way, this is a regex that correctly matches filename attribute, according to RFC 1806 (which references RFC 1521 and RFC 822.

    "filename=(?:([\\x21-\\x7E&&[^\\Q()<>[]@,;:\\\"/?=\\E]]++)|\"((?:(?:(?:\r\n)?[\t ])+|[^\r\"\\\\]|\\\\[\\x00-\\x7f])*)\")"
    

    Well, matching is one thing, but you still have to process the file name in the second case, at least to unquote special characters. (You still need to collapse linear-white-space: (?:(?:\r\n)?[\t ])+, as defined in RFC 822, to a single space, and replace non-printable characters).