Search code examples
databasescalaarchitecturedomain-driven-designtrust

How much trust should I put in the validity of retrieved data from database?


Other way to ask my question is: "Should I keep the data types coming from database as simple and raw as I would ask them from my REST endpoint"

Imagine this case class that I want to store in the database as a row:

case class Product(id: UUID,name: String, price: BigInt)

It clearly isn't and shouldn't be what it says it is because The type signatures of nameand price are a lie.

so what we do is create custom data types that better represent what things are such as: (For the sake of simplicity imagine our only concern is the price data type)

case class Price(value: BigInt) {
  require(value > BigInt(0))
}

object Price {
  def validate(amount: BigInt): Either[String,Price] = 
    Try(Price(amount)).toOption.toRight("invalid.price")
}

//As a result my Product class is now:
case class Product(id: UUID,name: String,price: Price)

So now the process of taking user input for product data would look like this:

//this class would be parsed from i.e a form:
case class ProductInputData(name: String, price: BigInt)

def create(input: ProductInputData) = {
  for {
    validPrice <- Price.validate(input.price)
  } yield productsRepo.insert(
      Product(id = UUID.randomUUID,name = input.name,price = ???)
    )
} 

look at the triple question marks (???). this is my main point of concern from an entire application architecture perspective; If I had the ability to store a column as Price in the database (for example slick supports these custom data types) then that means I have the option to store the price as either price : BigInt = validPrice.value or price: Price = validPrice.

I see so many pros and cons in both of these decisions and I can't decide. here are the arguments that I see supporting each choice:

Store data as simple database types (i.e. BigInt) because:

  • performance: simple assertion of x > 0 on the creation of Price is trivial but imagine you want to validate a Custom Email type with a complex regex. it would be detrimental upon retrieval of collections

  • Tolerance against Corruption: If BigInt is inserted as negative value it would't explode in your face every time your application tried to simply read the column and throw it out on to the user interface. It would however cause problem if it got retrieved and then involved in some domain layer processing such as purchase.

Store data as it's domain rich type (i.e. Price) because:

  • No implicit reasoning and trust: Other method some place else in the system would need the price to be valid. For example:
//two terrible variations of a calculateDiscount method:

//this version simply trusts that price is already valid and came from db:
def calculateDiscount(price: BigInt): BigInt = {
  //apply some positive coefficient to price and hopefully get a positive 
  //number from it and if it's not positive because price is not positive then 
  //it'll explode in your face.
}

//this version is even worse. It does retain function totality and purity
//but the unforgivable culture it encourages is the kind of defensive and 
//pranoid programming that causes every developer to write some guard 
//expressions performing duplicated validation All over!
def calculateDiscount(price: BigInt): Option[BigInt] = {
  if (price <= BigInt(0)) 
    None
  else 
  Some{
   //Do safe processing
  }
} 

//ideally you want it to look like this:
def calculateDiscount(price: Price): Price
  • No Constant conversion of domain types to simple types and vice versa: for representation, storage,domain layer and such; you simply have one representation in the system to rule them all.

The source of all this mess that I see is the database. if data was coming from the user it'd be easy: You simply never trust it to be valid. you ask for simple data types cast them to domain types with validation and then proceed. But not the db. Does the modern layered architecture address this issue in some definitive or at least mitigating way?


Solution

    1. Protect the integrity of the database. Just as you would protect the integrity of the internal state of an object.
    2. Trust the database. It doesn't make sense to check and re-check what has already been checked going in.
    3. Use domain objects for as long as you can. Wait till the very last moment to give them up (raw JDBC code or right before the data is rendered).
    4. Don't tolerate corrupt data. If the data is corrupt, the application should crash. Otherwise it's likely to produce more corrupt data.

    The overhead of the require call when retrieving from the DB is negligible. If you really think it's an issue, provide 2 constructors, one for the data coming from the user (performs validation) and one that assumes the data is good (meant to be used by the database code).

    I love exceptions when they point to a bug (data corruption because of insufficient validation on the way in).

    That said, I regularly leave requires in code to help catch bugs in more complex validation (maybe data coming from multiple tables combined in some invalid way). The system still crashes (as it should), but I get a better error message.