We have an API server running that serves around 500.000 requests a day. We want to keep all these reguests in a database to be able to analyze the data. We log things like:
We want to keep these logs for 3 months, something which will result in about 45.000.000 records in that database. When records are older than 3 months they are deleted.
Storing these 45 million records in a sql database is possible, but then it is really slow to perform any analysis on these data. We would like to do extensive analysis like - how many request did a specific user do today, compared to the same day last week? How many percent of requests failed today compared to any other day? See a trend diagram showing if the number of request are going up or down. See the top 10 resources being asked for at a given time. You get it - we want to be able to do all kind of analysis like this.
Can you give any advise on where to store these logs to be able to do analysis like this in realtime (or near realtime)? Any nosql database that could be good for this? Azure? I see there is something called azure sql datawarehouse, could that be used for this? I have looked at Microsoft Power Bi which will probably be great for doing the analysis on these data, but where do I store the data.
I would really appreciate if someone have some suggestions for me.
Power BI is potentially a good solution for you. It actually spins up a SQL Server Analysis Services instance in memory, which is effectively an "OLAP data warehouse". Infrastructure requirements are minimal as you design in a free PBI Desktop tool and publish to Microsoft's cloud for PBI Web users.
There are limits to the data that can be published - see link below. Note that PBI uses the very effective Vertipac compression so datasets are typically a lot smaller than your raw data. I often see 10k - 50k rows per MB, so 45m should be achievable with a single Pro license. Ruthlessly filter your column list in PBI Desktop to optimise this.
With PBI Pro license you can refresh Hourly, up to 8 times a day:
https://powerbi.microsoft.com/en-us/documentation/powerbi-refresh-data/
Building SQL databases and OLAP/SSAS solutions has been a good career for me over the last 20 years. That is still the "Rolls Royce" solution if you have the time and money. But after 20 years I am still learning as it is a technically challenging area. If you don't already have those skills, I suggest Power BI would be a more productive path.