I would like to write a custom web analytics and action tracking solution and use it as a feedback mechanism eg. for searches or content suggestion.
If it were only for short lived data, I would use some NOSQL engine with a limited data retention. But ideally I would like to keep a long history.
One nice aproach I have seen in the past, was to use mysql for storage, one table per month, with older tables being converted to MySQL ARCHIVE format. For browsing the archives and aggregate data, MySQL views were implemented.
My question: How does something like Google Analytics store its data? In a structured database or something else. Which way would you suggest to avoid a long-term memory hog, while keeping query possibilities flexible?
(I am not concerned about writing speeds into the DB, this will happen in async batches and not in realtime)
Google uses its own Big Table implementation to store its data. If you are interested in big data solutions and utilizing big data you should take a look at this. For an open source implementation built off of google's Big Table check out Hbase/Hadoop. I will post back some links in a minute.
The analytics themselves done on this type of data utilize map/reduce operations.