I am trying to count number of symbols with the string.punctuation module in python in a dataframe column but I can not find a way to have the opening parenthesis to be counted as python thinks does not consider it a string apparently.
I am working on linux + Jupyter notebook and python 3.8.
df = pd.DataFrame()
df['password'] = data
df['sign'] = 0
for i in string.punctuation:
print(i)
print(type(i))
df['sign'] += df['password'].str.count(i)
df['sign'].iloc[:100]
This gives me:
!
<class 'str'>
"
<class 'str'>
#
<class 'str'>
$
<class 'str'>
%
<class 'str'>
&
<class 'str'>
'
<class 'str'>
(
<class 'str'>
and afterwards the exception:
/opt/conda/lib/python3.8/sre_parse.py in _parse(source, state, verbose, nested, first)
834 p = _parse_sub(source, state, sub_verbose, nested + 1)
835 if not source.match(")"):
--> 836 raise source.error("missing ), unterminated subpattern",
837 source.tell() - start)
838 if group is not None:
error: missing ), unterminated subpattern at position 0
Thank you.
Example dataframe:
df = pd.DataFrame({'text': ['hel\\l\'o', 'hellO()world']})
Parentheses are part of the regex syntax so you need to escape them:
df['text'].str.count('\(')
To cover all of string.punctuation
you can use:
df['text'].str.count(f'[{re.escape(string.punctuation)}]')