Search code examples
rplotaggregationdata-analysis

How to add a new column and aggregate values in R


I am completely new to gnuplot and am only trying this because I need to learn it. I have a values in three columns where the first represents the filename (date and time, one hour interval) and the remaining two columns represent two different entities Prop1 and Prop2.

Datetime             Prop1        Prop2

20110101_0000.txt     2            5
20110101_0100.txt     2            5
20110101_0200.txt     2            5
...
20110101_2300.txt     2            5
20110201_0000.txt     2            5
20110101_0100.txt     2            5
...
20110201_2300.txt     2            5
...

I need to aggregate the data by the hour of the day (the **_0100) which is the last four numeric digits. So, I want to create another column called hour which tells me the hour of the day. It means 0000 = 0h, 0100 = 1h, ...... 2200 = 22h etc.

I then want to get the sum of Prop1 and Prop2 for each hour, so in the end get something like.

Hour  Prop1   Prop2
0h     120     104
1h     230     160
...
10h    90      110
...
23h    100    200 

and the get a line plot of Prop1 and Prop2.


Solution

  • A general solution with gsub :

    Data$Hour <- gsub(".+_(\\d+).txt","\\1",Data$Datetime)
    

    EDIT :

    You can use Data$Hour <- substr(Data$Hour,1,2) to get just the hour. As said in the comments, if you always have exactly the same structure in Datetime, you could use substr() immediately:

    Data$Hour <- substr(Data$Datetime,10,11)
    

    Then you can use aggregate, tapply, by, ... whatever to do what you want. To sum both Prop1 and Prop2, you can use aggregate, eg:

    aggregate(Data[2:3],list(Data$Hour),sum)
    

    with the dataset :

    zz<-textConnection("Datetime             Prop1        Prop2
    20110101_0000.txt     2            5
    20110101_0100.txt     2            5
    20110101_0200.txt     2            5
    20110101_2300.txt     2            5
    20110201_0000.txt     2            5
    20110201_0100.txt     2            5
    20110201_0200.txt     2            5
    20110201_2300.txt     2            5")
    Data <- read.table(zz,header=T,as.is=T)