Search code examples
javadatabaseooprelational

Java object structure most convenient for representing relational database


Suppose you can read a bunch of data in CSV files, containing some tables related to each other. What Java object structure, which uses standard Java collections available with JDK would be most adequate to represent relational database in anticipation of servicing the requests analogous to most common database queries, including standard selects, as well as count(*), select distinct(), group by, and other most common queries. The amount of data is relatively small, so that no memory issues will be anticipated, if we keep everything in memory.

Just want to emphasize, that this is not a question about SQL parsing, or about using in-memory SQL databases. This is just about most effective back-end relational database representation in Java, when you don't know all specific queries upfront.


Solution

  • If your question is "What data structures do databases, like SQLite, use internally?" the answer is far from simple, and is generally speaking implementation dependant. A major selling point of different databases is how they choose to structure their data, after all.

    In the case of SQLite, the online documentation provides a lot of detail about its inner workings. Other databases like H2 similarly provide implementation notes in their documentation, but in general the data structures used are usage-specific and custom.

    Speaking more theoretically, many database implement indexes as B+ Trees (among many other structures), but this is neither a requirement nor necessarily the "right" way to do it. The data structures they use are a result of many years of trial-and-error, benchmarking, and hard work.


    In response to your comment:

    Let's say you just have one CSV file, just one table, and it is not worth even going to SQLlite and H2 and it all can be fit within memory, so amount of data is not that big. What is the best way to represent this table as java object, anticipating queries you'll normally present to database table, but not knowing exactly upfront how you would need to query it?

    I would use SQLite, or H2. Both can be run in-memory, and let you run arbitrary SQL queries against their data. If you knew what you intended to query you could parse your CSV into a more special-purpose data structure, but if you don't know how you intend to query it, there is no way you can hope to create a more efficient data structure than existing database tools. Databases are specifically designed for the use case of "I don't know what queries I'll be running against this data".