I want to use Pydantic to convert a plaintext string password to a hashed value of type bytes on assignment using a specific hash function.
This is a minimal example, that shows my current (not-working) approach. However I don't have a very deep understanding of Pydantic yet.
import bcrypt
from pydantic import BaseModel, field_validator
def hash_password(password: str) -> bytes:
return bcrypt.hashpw(password.encode('utf-8'), salt = bcrypt.gensalt())
class Password(BaseModel):
hash_value: bytes
@field_validator("hash_value")
@classmethod
def set_password(cls, plain_password: str) -> bytes:
return hash_password(plain_password)
class Settings:
DEFAULT_PASSWORD = "my_plain_password"
settings = Settings()
password_doc = Password(
hash_value = settings.DEFAULT_PASSWORD
)
At first I accidentally declared hash_values as str, not realizing that the return value of hashpw
is of type bytes. This somehow worked, the hash_password
function was called on assignment. However, all the implicit type conversions that occurred invalidated my hashed password.
The problem now is that Pydantic expects a bytes value on assignment and implicitly converts the string settings.DEFAULT_PASSWORD
to a bytes value before passing it to the set_password
method, even though this one expects a string type.
My error message:
Traceback (most recent call last):
File "xxx", line 20, in <module>
password_doc = Password(
^^^^^^^^^
File "xxx", line 176, in __init__
self.__pydantic_validator__.validate_python(data, self_instance=self)
File "xxx", line 13, in set_password
return hash_password(plain_password)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "xxx", line 5, in hash_password
return bcrypt.hashpw(password.encode('utf-8'), salt = bcrypt.gensalt())
^^^^^^^^^^^^^^^
AttributeError: 'bytes' object has no attribute 'encode'. Did you mean: 'decode'?
Thanks a lot Dunes for your Answer, this fixes most of my issue.
However I noticed that the set_password
method is called more often than I thought.
I am using a Beanie Model to store a Config Document with the Password Document Linked to it:
class Password(base.Password, Document):
hash_value: bytes
@field_validator("hash_value", mode="before")
@classmethod
def set_password(cls, plain_password: str | bytes) -> bytes:
# new implementation
class Config(base.Config, Document):
password: Link[Password]
def get_password(self) -> Link[Password]:
return self.password
and just retrieving the Config document while fetching all Links, calls the set_password
method: models.Config.find_one(fetch_links=True)
And in this case the method is called with the actual stored binary hash_value. So I have to just return the value, in case it is binary.
Is this what is supposed to happen? Or rather some bug in my code.
This second validation when accessing the hashed password from a database is probably automatically conducted by Beanie and also makes total sense as the database can be altered from outside the python application.
I initially didn't want to allow binary arguments for the set_password method, as pedantic was implicitly converting my inputs. With the "before" mode this is no longer the case and I should allow binary arguments for database validation.
The issue is that, by default, field_validator uses an "after" validator. That is, they run after pydantic's own internal validators. Pydantic knows you want a bytes
and it has been passed a str
. It knows how to convert to and from str
and bytes
, so it does the encoding automatically, before passing the result to your validator.
If you add mode='before'
then your validator gets to run first. However, it must be able to accept any input, and not crash if it is given an int or a list or whatever.
eg.
@field_validator("hash_value", mode="before")
@classmethod
def set_password(cls, plain_password: str | bytes) -> bytes:
if isinstance(plain_password, bytes):
try:
plain_password = plain_password.decode("utf8")
except UnicodeDecodeError as ex:
# not strictly necessary as UnicodeDecodeError subclasses ValueError
# but shows that you must handle possible errors and raise ValueErrors
# when an input is invalid. And how to provide a supplementary
# error message.
raise ValueError("password is not a valid utf8 byte-string")
elif not isinstance(plain_password, str):
raise ValueError
return hash_password(plain_password)