Search code examples
javapythonhadoopbigdata

How to start learning Big Data? What are the modules I need to concentrate on as a developer


I'm planning to learn Big Data. I just have gone through tutorials but I'm a little bit confused what the modules are that I need to concentrate on from a developer perspective. Presently I'm working on java. I hope your response will be helpful for the next step of my Big Data journey.


Solution

  • First I'd propose to get familiar with the term, Big Data is a bit fluffy and debated one, more a marketing catchphrase than a technical specification, covering a huge range of technology.

    Starting from that I'd try to determine which aspect (IoT, build/run datacenters, etl/data integration/warehousing, analytics/statistics/machine learning...) or perhaps which field of application (retail, bioinformatics...) you're interested in, and which is reasonable to access from an employment point of view. I'd think also about the tech stack you'd like to work on (Scala, Python...).

    Reverse engineering job offers could be a way to get to that information actually.

    The Data Scientist profile (etl + machine learing + visualization) gained broad acceptance and encompasses certain skill sets, Big Data Analyst and Bid Data Engineer also can be found, arguably with a not so well defined profile.

    Nowadays one can get whole MSCs in data science (here's a personal evaluation of it), but perhaps you can get your foot into the door on a less fancy route too. Trainigs may come in varying quality, I found Andy Ngs machine learning and deep learing (big neural networks) MOOCs stunning, and everything coming from the EPFL-Scala side (if you want to go down that road) is technically superior and from the presentation ok (I tried Big Data Analysis with Scala and Spark).