Search code examples
rexport-to-csvdata-extractiondata-processing

How to extract part of ID feature from one column in R


I've got data with around 10,000 rows like this:

HH_ID         DEMAND
201101010000  35090
201101010030  35612
201101010100  35245
201101010130  34161
201101010200  33156
201101010230  32556
...           ...

where HH_ID refers to the half hourly ID, e.g 201101010230 refers to Year2011, Data0101, and time 0230.

I would like extract the Year and Date from HH_ID, and to have the data like:

HH_ID         Year_ID  Date_ID    DEMAND
201101010000    2011    0101       35090
201101010030    2011    0101       35612
201101010100    2011    0101       35245
201101010130    2011    0101       34161
...           ...

Does anyone know how to do this?


Solution

  • We can use substring

    Year_ID <- substring(df1$HH_ID, 1, 4)
    Date_ID <-  substring(df1$HH_ID, 5, 8)
    cbind(df1[1], Year_ID, Date_ID, df1[2])
    #         HH_ID Year_ID Date_ID DEMAND
    #1 201101010000    2011    0101  35090
    #2 201101010030    2011    0101  35612
    #3 201101010100    2011    0101  35245
    #4 201101010130    2011    0101  34161
    #5 201101010200    2011    0101  33156
    #6 201101010230    2011    0101  32556
    

    Or another option is sub with read.table

    cbind(df1, read.csv(text=sub("(.{4})(.{4}).*", "\\1,\\2", df1$HH_ID), 
      col.names = c("Year_ID", "Date_ID"), 
       colClasses = c("numeric", "character"), header=FALSE))