Search code examples
google-analyticsstatisticsanalyticsweb-analyticsbigdata

How to get started in Big Data and Web Analytics


I'm currently interested in working and studying Big data analytics and web analytics, but I don't know how and where to get started. I tried looking in the Internet, but some are advance for me. Is there any skills, knowledge in statistics and mathematics that I need first before going this route?

My current plan is to attend online courses every weekends, since I'm currently working as Associate Software Engineer during weekdays, and practice programming languages needed for Big data like R. I already have a degree in Computer Science so familiarity with some statistical and mathematical methods is not a problem. Any suggestions and comments are pretty much appreciated!

For those who already have an experience, how is your experienced and what do you work most with?


Solution

  • I am in a similar boat as you. I work in a web development department as a business analyst. I do some software development, data mining, and data visualization, but I am constantly improving my skills because it's all pretty interesting to me, and it makes me an extremely versatile employee.

    Web Analytics/Big Data
    See if you can get read access into your company's Google Analytics account, assuming they have a website. The API is really good, and pre-built packages in R make it really easy to get large amounts of data out. If their website is big enough, you can easily create your own, real data sets. While these probably won't be "big" as in "big data", they're definitely awesome for practicing data visualizations. I'd suggest learning Shiny and R Markdown. You can easily create web stats visualizations you can share with your company. If you end up coming up to issues with the amount of data you're trying to process (ie: if they have a huge web presence), then you might look into Spark for processing big data. Coursera has a specialization focusing on Big Data - https://www.coursera.org/specializations/big-data. You can take all the classes for free if you just "audit" them. You won't get certificate or anything, but you get access to all the course material. They apparently go through Spark, Hadoop, Pig, and Hive. I haven't taken it, but the UCSD Coursera classes I have taken have been pretty good.

    Obviously Coursera isn't the end-all-be-all... Also check out edx.org, Pluralsight, Udemy, etc... You can get a free Pluralsight membership for a year - just Google it. Mine was through Microsoft somehow. My favorite courses by Pluralsight have been (unrelated to data/analytics) Ethical Hacking. Udemy often has amazing deals on HUGE courses - like 21 hours of lectures about Python for data analysis and stuff like that. Just sign up for the service, and you'll get a "special offer" in a week or two. They're usually $10-20. https://www.brighttalk.com/ is also a good place for webinars and talks related to data science/analytics.

    Databases
    My company uses SQL Server (Microsoft), so I also took some database classes on MVA (Microsoft Virtual Academy). They have a bunch of classes from complete noob to brushing up on skills: MVA Database Stuff.

    Data Sets
    If you find yourself needing big data sets, join Kaggle. They often have great data sets for machine learning, but you can use them yourself to mine and do visualizations. I'd look for labelled data sets in particular. Many of the bigger sets are completely anonymized - no labels, no nothin'. But that's not very fun if you're just digging around. Additionally, someone has compiled a bunch of public data sources here: https://github.com/caesar0301/awesome-public-datasets. Finally, NYC Open Data is one of my favorite places to get net data sets. Some are super boring, but there have been some cool analyses done on parking tickets and alike.

    More...
    If you're just looking for more classes to take or books to read, check out https://www.metacademy.org/. They have a few suggested paths to learn deep learning, machine learning, Bayesian stats, and other stuff like that. I think machine learning is an excellent next step - once you're versed in software development, database management/creation/querying, and visualization.

    Even more...
    Just immerse yourself. There are TONS of data blogs, podcast, meetup groups, conferences, and news out there. Do all you can to get in there and figure out what's going on and who's doing what. It's super interesting anyway. Two of my favorite things I follow: datatau (hacker news for data science) and I Quant NY (linked above, for parking tickets).