Search code examples
pythonregexfinance

Python Regex for Equity Option not Matching


I'm trying to create a regex to find option symbols in broker data. Per Wikipedia the format is:

  1. Root symbol of the underlying stock or ETF, padded with spaces to 6 characters
  2. Expiration date, 6 digits in the format yymmdd
  3. Option type, either P or C, for put or call
  4. Strike price, as the price x 1000, front padded with 0s to 8 digits

So I created this regex:

option_regex = re.compile(r'''(
(\w{1,6})            # beginning ticker, 1 to 6 word characters
(\s)?                # optional separator
(\d{6})              # 6 digits for yymmdd
([cp])               # C or P for call or put
(\d{8})              # 8 digits for strike price
)''', re.VERBOSE | re.IGNORECASE)

But when I test it out I get an error:

import re

option_regex = re.compile(r'''(
(\w{1,6})            # beginning ticker, 1 to 6 word characters
(\s)?                # optional separator
(\d{6})              # 6 digits for yymmdd
([cp])               # C or P for call or put
(\d{8})              # 8 digits for strike price
)''', re.VERBOSE | re.IGNORECASE)

result = option_regex.search('AAPL  170818C00155000')

result.group()
Traceback (most recent call last):

  File "<ipython-input-4-0273c989d990>", line 1, in <module>
    result.group()

AttributeError: 'NoneType' object has no attribute 'group'

Solution

  • From python documentation on re.search():

    Scan through string looking for the first location where the regular expression pattern produces a match, and return a corresponding MatchObject instance. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.

    Your code throws this exception, because the subroutine didn't found anything. Basically, you are trying to run .group() on None. It would be a good idea to defend against it:

    if not result:
        ... # Pattern didn't match the string
        return
    

    Your pattern doesn't match the string you typed in, because it has lengthier separator than what you assumed it to be: it has 2 spaces instead of one. You can fix that by adding a + ("at-least-once") to the rule:

    (\s+)?                # optional separator