Search code examples
mysqlstatisticsanalyticsdata-mining

Analyzing mysql data


I'm totally new to data analytics, and was wondering if anyone had any suggestions of how to start?

Here's the problem I am trying to solve. I have a mysql database that gets anywhere from 20 rows a day to more than several million rows a day added (depending on the datasource), and I want to analyze it for relationships. Basically it's possibilities of combinations of data (red = 2, blue=5, black=5, etc) and I want a tool to analyze it day by day to see which ones are most likely (i.e. if I put constraints like only 5% of the total value can change or only 5 colors can be chosen).

I think this is going to be complex but I'm new and totally willing to learn. For a problem like above (and related type of analyze problems), what would you suggest I do? I'm looking for a tool (open source please...I'm a poor student), a suggestion of a book, howto doc, etc. I want a good foundation and this is not production (it's a learning environment I setup so I can experiment).

I'm learning python and java, and was considering using those tools to do the analytics but a friend suggested maybe use a tool designed for it or follow a tried/tested method of doing this.


Solution

  • You're pretty light on the actual details. But if you're looking for open-source statistical analysis packages, I would suggest starting with r, weka, or knime. Of course, this is a pretty significant subject.

    Depending on your level of understanding there is an awful lot one could do using SQL as well, but without knowing anything about your data structure, what it represents, and what you're trying to accomplish, there isn't any use in trying to explain what you 'could' do.

    To your mentioning learning Python and Java, that will be beneficial as well. Both have several open source stats packages available and/or can tie into the programs that I listed above.