python python-3.x validation pydantic-v2

Pydantic type conversion on assignment

I want to use Pydantic to convert a plaintext string password to a hashed value of type bytes on assignment using a specific hash function.

This is a minimal example, that shows my current (not-working) approach. However I don't have a very deep understanding of Pydantic yet.

import bcrypt
from pydantic import BaseModel, field_validator

def hash_password(password: str) -> bytes:
    return bcrypt.hashpw(password.encode('utf-8'), salt = bcrypt.gensalt())

class Password(BaseModel):
    hash_value: bytes
    
    @field_validator("hash_value")
    @classmethod
    def set_password(cls, plain_password: str) -> bytes:
        return hash_password(plain_password)
    
class Settings:
    DEFAULT_PASSWORD = "my_plain_password"
    
settings = Settings()
    
password_doc = Password(
    hash_value = settings.DEFAULT_PASSWORD
)

At first I accidentally declared hash_values as str, not realizing that the return value of hashpw is of type bytes. This somehow worked, the hash_password function was called on assignment. However, all the implicit type conversions that occurred invalidated my hashed password.

The problem now is that Pydantic expects a bytes value on assignment and implicitly converts the string settings.DEFAULT_PASSWORD to a bytes value before passing it to the set_password method, even though this one expects a string type.

My error message:

Traceback (most recent call last):
  File "xxx", line 20, in <module>
    password_doc = Password(
                   ^^^^^^^^^
  File "xxx", line 176, in __init__
    self.__pydantic_validator__.validate_python(data, self_instance=self)
  File "xxx", line 13, in set_password
    return hash_password(plain_password)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "xxx", line 5, in hash_password
    return bcrypt.hashpw(password.encode('utf-8'), salt = bcrypt.gensalt())
                         ^^^^^^^^^^^^^^^
AttributeError: 'bytes' object has no attribute 'encode'. Did you mean: 'decode'?

Edit:

Thanks a lot Dunes for your Answer, this fixes most of my issue. However I noticed that the set_password method is called more often than I thought.

I am using a Beanie Model to store a Config Document with the Password Document Linked to it:

class Password(base.Password, Document):
    hash_value: bytes
    
    @field_validator("hash_value", mode="before")
    @classmethod
    def set_password(cls, plain_password: str | bytes) -> bytes:
        # new implementation

class Config(base.Config, Document):
    password: Link[Password]
    
    def get_password(self) -> Link[Password]:
        return self.password

and just retrieving the Config document while fetching all Links, calls the set_password method: models.Config.find_one(fetch_links=True) And in this case the method is called with the actual stored binary hash_value. So I have to just return the value, in case it is binary.

Is this what is supposed to happen? Or rather some bug in my code.

Edit 2

This second validation when accessing the hashed password from a database is probably automatically conducted by Beanie and also makes total sense as the database can be altered from outside the python application.

I initially didn't want to allow binary arguments for the set_password method, as pedantic was implicitly converting my inputs. With the "before" mode this is no longer the case and I should allow binary arguments for database validation.

Solution

The issue is that, by default, field_validator uses an "after" validator. That is, they run after pydantic's own internal validators. Pydantic knows you want a bytes and it has been passed a str. It knows how to convert to and from str and bytes, so it does the encoding automatically, before passing the result to your validator.

If you add mode='before' then your validator gets to run first. However, it must be able to accept any input, and not crash if it is given an int or a list or whatever.

eg.

@field_validator("hash_value", mode="before")
@classmethod
def set_password(cls, plain_password: str | bytes) -> bytes:
    if isinstance(plain_password, bytes):
        try:
            plain_password = plain_password.decode("utf8")
        except UnicodeDecodeError as ex:
            # not strictly necessary as UnicodeDecodeError subclasses ValueError
            # but shows that you must handle possible errors and raise ValueErrors
            # when an input is invalid. And how to provide a supplementary
            # error message.
            raise ValueError("password is not a valid utf8 byte-string")
    elif not isinstance(plain_password, str):
        raise ValueError
    return hash_password(plain_password)