I'm working on a star schema for analysis of posted form data. The site that the form data will be posted to is actually external to the site hosting the form, so only the data in the form will be available. I'm going to give the option to include some extra useful information with hidden fields, original referrer, session id etc.
I'll be able to use regular expressions to match certain data types and pull them out to specific dimensions e.g. Postcodes.
I have a solution to dealing with the arbitrary nature of the dimensions, its not a great one but it will work.
The problem that I have is that I have no idea what is going to be in my fact table, its not like there is a nice numerical value that I can aggregate. Apart from the fact that "yes there is a form post" that satisfies these criteria.
I'm wondering if I'm approaching this in the right way? Am I using the wrong tool for the job? Or am I just missing something?
Simon.
Further detail:
There are two areas of functionality, filtering the form posts dependant on criteria e.g. between two timestamps. But pretty much anything is up for grabs in terms of filtering. The selected form posts will then be used to generate a csv file for export.
The other main area is analytics, studying the conversion of ad spend into customer leads is an obvious starting point. Also somewhat open ended and depends on the form data.
You aren't designing a star schema. You're designing an Entity-Attribute-Value table, which has all the problems you're identifying.
If you really have no idea what your data will look like, i.e. what form fields exist and what data types should be used for each one, then a relational database is not the right tool to persist the information. Try XML or YAML or JSON. Those are structured, but dynamic, formats. You can establish metadata on the fly. You can store the whole form instance in a file or in a BLOB in your database.
Another emerging technology that can manage dynamic metadata is RDF, with the query language SPARQL. Sesame is an example of a semantic data engine.