Here is the regex code
pattern="""
(?P<host>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})
(\ \-\ )
(?P<user_name>[a-z]{1,100}\d{4}|\-{1})
( \[)(?P<time>\d{2}\/[A-Za-z]{3}\/\d{4}\:\d{2}\:\d{2}\:\d{2}\ -\d{4})
(\] ")
(?P<request>.+)
(")
"""
for item in re.finditer(pattern,text,re.VERBOSE):
# We can get the dictionary returned for the item with .groupdict()
print(item.groupdict())
And I use Jupyter Notebook to run those codes.
The testing text is
146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622
197.109.77.178 - kertzmann3129 [21/Jun/2019:15:45:25 -0700] "DELETE /virtual/solutions/target/web+services HTTP/2.0" 203 26554
The main issue is that you did not escape the literal space in your pattern. When using re.X
/ re.VERBOSE
any whitespace (when outside of a character class) in the pattern is treated as formatted whitespace and not accounted for in the end. In Python re
pattern, [ ]
will always match a literal space, but this is not guaranteed in other language flavors, so the best way to match a space in the pattern that is compiled with the re.X
like flag is escaping the space.
Besides, there are other things to note:
{1}
is always redundant, remove it\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
=> \d{1,3}(?:\.\d{1,3}){3}
/
and :
(anywhere in the pattern) and -
(when outside a character class) in the re
regex.Thus, you can use
pattern = r'''(?P<host>\d{1,3}(?:\.\d{1,3}){3})
(\ -\ )
(?P<user_name>[a-z]{1,100}\d{4}|-)
(\ \[)(?P<time>\d{2}/[A-Za-z]{3}/\d{4}:\d{2}:\d{2}:\d{2}\ -\d{4})
(\]\ ")
(?P<request>.+)
(")'''
See the regex demo and the Python demo:
import re
text = '''146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622
197.109.77.178 - kertzmann3129 [21/Jun/2019:15:45:25 -0700] "DELETE /virtual/solutions/target/web+services HTTP/2.0" 203 26554'''
pattern = r'''(?P<host>\d{1,3}(?:\.\d{1,3}){3})
(\ -\ )
(?P<user_name>[a-z]{1,100}\d{4}|-)
(\ \[)(?P<time>\d{2}/[A-Za-z]{3}/\d{4}:\d{2}:\d{2}:\d{2}\ -\d{4})
(\]\ ")
(?P<request>.+)
(")'''
for item in re.finditer(pattern,text,re.VERBOSE):
print(item.groupdict()) # We can get the dictionary returned for the item with .groupdict()
Output:
{'host': '146.204.224.152', 'user_name': 'feest6811', 'time': '21/Jun/2019:15:45:24 -0700', 'request': 'POST /incentivize HTTP/1.1'}
{'host': '197.109.77.178', 'user_name': 'kertzmann3129', 'time': '21/Jun/2019:15:45:25 -0700', 'request': 'DELETE /virtual/solutions/target/web+services HTTP/2.0'}