Search code examples
pythonpydantic

Using regex (from the re python library) inside of a pydantic model


I am using re to parse a string into its composite parts. The problem is that pydantic 2 does NOT like this.

Example:

MyClass(RootModel[str])
    root: str

    _FREQUENCY_PATTERN = re.compile(r"^(\d+)\s*/\s*(\d+)([YMWD])$")

    @classmethod
    def _parse(cls, s: str) -> tuple[int, int, str]:
        match = cls._FREQUENCY_PATTERN.search(s)
        if match is None:
            raise ValueError("must be a number over a period (D|W|M|Y). e.g. 5/1W")
        n = int(match.group(1))
        t = int(match.group(2))
        u = match.group(3)
        return n, t, u

    @field_validator("root")
    @classmethod
    def _check_format(cls, v: str) -> str:
        cls._parse(v) # use _parse to validate incoming data
        return v

I need to keep the _parse method, as it is used in data storage methods. That is, we receive the data in this particular format, and we break it apart to store it using the _parse method.

When I run the code that tests the example, I get the error:

    def __getattr__(self, item: str) -> Any:
        """This function improves compatibility with custom descriptors by ensuring delegation happens
        as expected when the default value of a private attribute is a descriptor.
        """
        if item in {'__get__', '__set__', '__delete__'}:
            if hasattr(self.default, item):
                return getattr(self.default, item)
>       raise AttributeError(f'{type(self).__name__!r} object has no attribute {item!r}')
E       AttributeError: 'ModelPrivateAttr' object has no attribute 'search'

../../../../Library/Caches/pypoetry/virtualenvs/triple-models-4xhdxIhn-py3.11/lib/python3.11/site-packages/pydantic/fields.py:890: AttributeError

It seems like pydantic is overwriting the work that re should be doing.

This code did work with Pydantic 1.x.x

Two questions:

  1. What is going on?
  2. How can I parse the incoming data using regex to accomplish the same thing I had working in pydantic 1, but in pydantic 2? (I need to be able to access each of the 3 elements individually for serialization.)

Solution

  • Pydantic does some meta programming under the hood that changes the class variables defined in it. From your example I cannot see a reason your compiled regex needs to be defined in the Pedantic subclass. Either move the _FREQUENCY_PATTERN to global scope or put it in parse and access it locally. like such:

    _FREQUENCY_PATTERN = re.compile(r"^(\d+)\s*/\s*(\d+)([YMWD])$")
    MyClass(RootModel[str])
        root: str
    
        @classmethod
        def _parse(cls, s: str) -> tuple[int, int, str]:
            match = _FREQUENCY_PATTERN.search(s)
            if match is None:
                raise ValueError("must be a number over a period (D|W|M|Y). e.g. 5/1W")
            n = int(match.group(1))
            t = int(match.group(2))
            u = match.group(3)
            return n, t, u
    
        @field_validator("root")
        @classmethod
        def _check_format(cls, v: str) -> str:
            cls._parse(v) # use _parse to validate incoming data
            return v