I am currently designing a Survey system (where a Survey has many questions, a question has many answers, and a Response belongs_to a user, survey, question and answer).
I will have a lot of demographic data in the User model and expect 100's of thousands of responses to various questions, etc.
Eventually we will want to analyze the responses, for example. 80% of males like bananas, 20% of females own a Ford and whatnot.
I am looking into statistical languages like R,SAS and SPSS, and am wondering if my data will need to be structured in any specific way in order to be used by these programs? Or do they all accept csv files?
Is there any advice that you have in terms of statistical data, and structuring data models for it?
Finally, how much does SAS, SPSS and Stata cost?
CSV files
are more than enough. R
is powerful to manage all your data arranged in rows and columns.
For example: You can arrange all columns of csv as Variables/Responses with headers and your rows could be data or vice-versa.
It doesn't matter as long as they are arranged in rows and columns. Comma
, Space
de-limited columns in CSV files can be easily handled. Not that I am specific, you can have any delimiter
and R
has powerful regular expression
matching.
Only suggestion is you should just make different CSV files for different data-sets to make things easier and it could all be imported into a data-frame
from CSV file easily.
Once you get it done, you are free to unleash the power of R