In our datawarehouse, we have a huge and one of the most used and important table, that stores account data(their ids, saldo corresponding to operational day, etc). On a daily basis, approximately 90 mln rows are loaded into this table. Table is composite partitioned, with interval of Month on column OPER_DAY and list subpartitions on column FILIAL. The table has 9 indexes, 6 of which are Bitmap.
Loading this amount of data is already taking lots of time, around 2 hours, we do bulk collect and use APPEND_VALUES hint to get advantage of a direct path insert. We also specify nologging for a table but since force logging is enabled this doesn't impact at all.
Currenty table has data for 3 years. Reading part is also becoming an issue, if an OBIEE user queries this table to get info on one day (sum of saldo for one day for example) , then it is working fast, but if a user wants to see for a month or two, then it is getting much slower.
Is there any way I can split a table into multiple tables, each of which will store only one year's data and include them all in a new view with union alls? Each select of a union will have a condition so that if user queries data of 2023 then only one select of a union will be executed, if he runs for 2022 and 2023 then two selects inside union will be executed and so on? Therefore, there wouldn't be a need to scan whole table, only required table after splitting will be selected.
Main problem we are facing with this approach is we don't know what filter on calendar an OBIEE user wil set in an analysis. We had an idea to use SYS_CONTEXT, and set context parameters for a session, but we couldn't find a way to set context from OBIEE filters, we did only for session variables by setting them with dashboard prompt. Is it possible or what could be another approach?
Is there any way I can split a table into multiple tables, each of which will store only one year's data
This is what partitioning does (at the database level) and you are already using partitioning at month level. Assuming OPER_DAY
is going to be filtered in user queries then you should already be getting partition elimination taking place to cut down the amount of data being read.
If you are loading 90 millions records a day then you have a lot of data so it is going to take some time to load and query. There are plenty of things you can do about loading the data but performance tuning is dependant on your system and code it is impossible to give you anything particularly useful only generic advice.
First off I am assuming at that volume you have hardware to cope? You are running things in parallel and on RAC? You haven't mentioned it so I assume you are not using Exadata.
I am guessing that at 90 million records a day the data is primarily (only?) new and insert rather than update? If so your first bottleneck will be that all the data is new so you are hitting the same segments for all data. Depending on the distribution of FILIAL
that sub-partitioning could be hindering you rather then helping.
If it is insert only check that the execution plan is actually doing a direct path insert. You can put all the hints in you like but if direct path isn't possible you aren't going to get the benefits.
Depending on what you are licensed for, advanced compression may help too.
Indexes don't always help. If they aren't being used you are just introducing bottlenecks of updating the indexes for the sake of it.
Ultimately you need to look at where the database is doing and where the bottlnecks are and then look for solutions.
There are lots of things you can do but I suspect trying to segment things will only cause you problems elsewhere. If you really want to do it then look at fragmentation.
I suspect that reports do not need all the data, they are aggregating up and removing columns. The best option for reporting speed is to create aggregates and add these as separate logical table sources. This is all invisible to the user but if their query can be satisfied by an existing aggregate that will get used and will massively reduce response times.
Beyond that you are back to performance tuning which sounds like it will likely be database tuning again and looking at the explain plan of physical queries and looking where they can be improved.