Other way to ask my question is: "Should I keep the data types coming from database as simple and raw as I would ask them from my REST endpoint"
Imagine this case class that I want to store in the database as a row:
case class Product(id: UUID,name: String, price: BigInt)
It clearly isn't and shouldn't be what it says it is because The type signatures of name
and price
are a lie.
so what we do is create custom data types that better represent what things are such as: (For the sake of simplicity imagine our only concern is the price
data type)
case class Price(value: BigInt) {
require(value > BigInt(0))
}
object Price {
def validate(amount: BigInt): Either[String,Price] =
Try(Price(amount)).toOption.toRight("invalid.price")
}
//As a result my Product class is now:
case class Product(id: UUID,name: String,price: Price)
So now the process of taking user input for product data would look like this:
//this class would be parsed from i.e a form:
case class ProductInputData(name: String, price: BigInt)
def create(input: ProductInputData) = {
for {
validPrice <- Price.validate(input.price)
} yield productsRepo.insert(
Product(id = UUID.randomUUID,name = input.name,price = ???)
)
}
look at the triple question marks (???
). this is my main point of concern from an entire application architecture perspective; If I had the ability to store a column as Price
in the database (for example slick
supports these custom data types) then that means I have the option to store the price as either price : BigInt = validPrice.value
or price: Price = validPrice
.
I see so many pros and cons in both of these decisions and I can't decide. here are the arguments that I see supporting each choice:
Store data as simple database types (i.e. BigInt
) because:
performance: simple assertion of x > 0
on the creation of Price
is trivial but imagine you want to validate a Custom Email
type with a complex regex. it would be detrimental upon retrieval of collections
Tolerance against Corruption: If BigInt
is inserted as negative value it would't explode in your face every time your application tried to simply read the column and throw it out on to the user interface. It would however cause problem if it got retrieved and then involved in some domain layer processing such as purchase.
Store data as it's domain rich type (i.e. Price
) because:
//two terrible variations of a calculateDiscount method:
//this version simply trusts that price is already valid and came from db:
def calculateDiscount(price: BigInt): BigInt = {
//apply some positive coefficient to price and hopefully get a positive
//number from it and if it's not positive because price is not positive then
//it'll explode in your face.
}
//this version is even worse. It does retain function totality and purity
//but the unforgivable culture it encourages is the kind of defensive and
//pranoid programming that causes every developer to write some guard
//expressions performing duplicated validation All over!
def calculateDiscount(price: BigInt): Option[BigInt] = {
if (price <= BigInt(0))
None
else
Some{
//Do safe processing
}
}
//ideally you want it to look like this:
def calculateDiscount(price: Price): Price
The source of all this mess that I see is the database. if data was coming from the user it'd be easy: You simply never trust it to be valid. you ask for simple data types cast them to domain types with validation and then proceed. But not the db. Does the modern layered architecture address this issue in some definitive or at least mitigating way?
The overhead of the require
call when retrieving from the DB is negligible. If you really think it's an issue, provide 2 constructors, one for the data coming from the user (performs validation) and one that assumes the data is good (meant to be used by the database code).
I love exceptions when they point to a bug (data corruption because of insufficient validation on the way in).
That said, I regularly leave requires
in code to help catch bugs in more complex validation (maybe data coming from multiple tables combined in some invalid way). The system still crashes (as it should), but I get a better error message.