Search code examples
rdataframetime-seriessubsetas.date

Subset a dataframe based on numerical values of a string inside a variable


I have a data frame which is a time series of meteorological measurement with monthly resolution from 1961 till 2018. I am interested in the variable that measures the monthly average temperature since I need the multi-annual average temperature for the summers.

To do this I must filter from the "DateVaraible" column the fifth and sixth digit, which are the month. The values in time column are formatted like this "19610701". So I need the 07(Juli) after 1961.

I start coding for 1 month for other purposes, so I did not try anything worth to mention. I guess that .grepl could do the work, but I do not know how the "matching" operator works.

So I started with this code that works.

summersmonth<- Df[DateVariable %like% "19610101" I DateVariable %like% "19610201"]

I am expecting a code like this

summermonths <- Df[DateVariable %like% "**06**" I DateVariable%like% "**07**..]

So that all entries with month digit from 06 to 09 are saved in the new dataframe summermonths.

Thanks in advance for any reply or feedback regarding my question.

Update

Thank to your answers I got the first part, which is to convert the variable in a as.date with the format "month"(Class=char) Now I need to select months from Juni to September . A horrible way to get the result I wanted is to do several subset and a rbind afterward.

Sommer1<-subset(Df, MonthVar == "Mai")
Sommer2<-subset(Df, MonthVar == "Juli")
Sommer3<-subset(Df, MonthVar == "September")

SummerTotal<-rbind(Sommer1,Sommer2,Sommer3)

I would be very glad to see this written in a tidy way.

Update 2 - Solution

Here is the tidy way, as here Using multiple criteria in subset function and logical operators

Veg_Seas<-subset(Df, subset = MonthVar %in% c("Mai","Juni","Juli","August","September"))

Solution

  • You can convert your date variable as date (format) and take the month:

    allmonths <- month(as.Date(Df$DateVariable, format="%Y%m%d"))
    

    Note that of your column has been originally imported as factor you need to convert it to character first:

    allmonths <- month(as.Date(as.character(Df$DateVariable), format="%Y%m%d"))
    

    Then you can check whether it is a summermonth:

    summersmonth <- Df[allmonths %in% 6:9, ]
    

    Example:

    as.Date("20190702", format="%Y%m%d")
    [1] "2019-07-02"
    
    month(as.Date("20190702", format="%Y%m%d"))
    [1] 7