Search code examples
architecturesoftware-design

How to abstract the database access layer?


So I'm a beginner in software development, everytime I have to program something that gets its data from a database I have problems structuring that layer, I was wondering if I'm overthinking.

I know we should make our systems easy to change, and I figure the database vendor is one of the things that is likely to change. SQL databases usually remain the same or the language already provides some form of abstraction, like JDBC.

But what happens if the new source of data is another type of database, should I provide a single DataSource abstraction? I was also thinking that to do this I would need some kind of object to parse the data into, similiar to a ResultSet in Java, or even something as simple as a String Map and parse to the type I need elsewhere. But I also think doing this could hurt complexity times, passing the data of one data structure to another more-general one.

I haven't used an ORM, so I don't know if they provide the abstraction I'm thinking of, do they help solve this issue?

So how is this usually approached? Or is the abstraction between different types of databases not needed?


Solution

  • Are you familiar with the Repository pattern? See https://martinfowler.com/eaaCatalog/repository.html and https://www.baeldung.com/java-dao-vs-repository.

    I would make a mental distinction between a data source (e.g. a file, a database, an API, etc) and a data set (e.g. a bunch of specific records, such as orders).

    As an example - let's say I have a very simple eCommerce system where I have two different datasources: a NoSQL DB (let's say Mongo) that stores user profiles and a relational SQL DB (let's say Postgres) that stores information on orders. Given that your referred to an ORM, I'm approaching this in an OO paradigm like Java or C#. Therefore, I create two domain classes - UserProfile and Order. In addition, I create two repository interfaces - UserProfileRepository and OrderRepository. All of the code that operates on orders or user profiles will do so via the repositories - so persistOrder(), getOrders(), etc.

    I then create two interface implementations - a MongoUserProfileRepository and a PostgresOrderRepository. All of the code that uses these repositories depends only on the abstractions (the interface). My MongoUserProfileRepository uses a Mongo datasource and knows how to read from Mongo and map the results an object that the rest of my code can work with. The same applies to my PostgresOrderRepository - it just uses a Postgres datasource.

    If I ever want to move my user profiles to Postgres as well, I only have to create a PostgresUserProfileRepository and pass that to all the code that depends on the UserProfileRepository interface. All of that change lives in one place - only in my repository layer. Nothing else is affected by the change, as long as the data model still looks the same.

    To answer your other question - yes, there is some overhead in mapping from one data structure (a SQL table) to another (an object), but I've never experienced that as a major issue (see https://www.techopedia.com/definition/32462/impedance-mismatch).