According to the HTML::Entities
documentation, the second argument to encode_entities
:
The unsafe characters is specified using the regular expression character class syntax (what you find within brackets in regular expressions).
The default set of characters to encode are control chars, high-bit chars, and the <, &, >, ' and " characters.
However the page doesn't provide an example of what the equivalent argument would be for the default set. I'd like to make a minor adjustment to the set of unsafe chars without regressing.
What regex character class would be equivalent to «control chars, high-bit chars, and the <, &, >, ' and "» which I can use as a starting point?
According to the module source, it looks like:
/([^\n\r\t !\#\$%\(-;=?-~])/
From this bit in encode_entities
:
# Encode control chars, high bit chars and '<', '&', '>', ''' and '"'
$$ref =~ s/([^\n\r\t !\#\$%\(-;=?-~])/$char2entity{$1} || num_entity($1)/ge;
A non-negated class:
/([\x00-\x08\x0b\x0c\x0e-\x1f\x7f-\xff<&>'"])/