Search code examples
rhadoop

Manipulate data set column in r hadoop


I have a data set which have a date (1/10/2015, 1/10/2016, 1/10/2017). I want to change it's format like this (2015, 2016, 2017). I need to do this using Hadoop.


Solution

  • Use a regex expression to extract the required value.

    The good tutorial with examples could be found in this blog: Extract date in required formats from hive tables

    If you want the year and month alone of the format ‘yyyy-MM’ then use regexp_extract(column_datetime,'(.*\-.*)\-.*',1)


    EDIT: Originally ^this^ was posted as the comment, but I wrapped it as the answer, so other people may find it quicker.