Search code examples
rseq

Subsetting odd rows in r using seq


Hope it is not a too newbie question.

I am trying to subset rows from the GDP UK dataset that can be downloaded from here: http://www.ons.gov.uk/ons/site-information/using-the-website/time-series/index.html

The dataframe looks more or less like that:

       X    ABMI
1   1948    283297
2   1949    293855
3   1950    304395

....

300 2013 Q2 381318
301 2013 Q3 384533
302 2013 Q4 387138
303 2014 Q1 390235

The thing is that for my analysis I only need the data for years 2004-2013 and I am interested in one result per year, so I wanted to get every fourth row from the dataset that lies between the 263 and 303 row.

On the basis of the following websites:

https://stat.ethz.ch/pipermail/r-help/2008-June/165634.html (plus a few that i cannot quote due to the link limit)

I tried the following, each time getting some error message:

> GDPUKodd <- seq(GDPUKsubset[263:302,], by = 4)
    Error in seq.default(GDPUKsubset[263:302, ], by = 4) : 
  argument 'from' musi mieæ d³ugoœæ 1

> OddGDPUK <- GDPUK[seq(263, 302, by = 4)]
    Error in `[.data.frame`(GDPUK, seq(263, 302, by = 4)) : 
  undefined columns selected

> OddGDPUKprim <- GDPUK[seq(263:302), by = 4]
Error in `[.data.frame`(GDPUK, seq(263:302), by = 4) : 
  unused argument (by = 4)

> OddGDPUK <- GDPUK[seq(from=263, to=302, by = 4)]
Error in `[.data.frame`(GDPUK, seq(from = 263, to = 302, by = 4)) : 
  undefined columns selected

> OddGDPUK <- GDPUK[seq(from=GDPUK[263,] to=GDPUK[302,] by = 4)]
Error: unexpected symbol in "OddGDPUK <- GDPUK[seq(from=GDPUK[263,] to"

> GDPUK[seq(1,nrows(GDPUK),by=4),]
Error in seq.default(1, nrows(GDPUK), by = 4) : 
  could not find function "nrows"

To put a long story short: help!


Solution

  • Instead of trying to extract data based on row ids, you can use the subset function with appropriate filters based on the values.

    For example if your data frame has a year column with values 1948...2014 and a quarter column with values Q1..Q4, then you can get the right subset with:

    subset(data, year >= 2004 & year <= 2013 & quarter == 'Q1')
    

    UDATE

    I see your source data is dirty, with no proper year and quarter columns. You can clean it like this:

    x <- read.csv('http://www.ons.gov.uk/ons/datasets-and-tables/downloads/csv.csv?dataset=pgdp&cdid=ABMI')
    x$ABMI <- as.numeric(as.character(x$ABMI))
    x$year <- as.numeric(gsub('[^0-9].*', '', x$X))
    x$quarter <- gsub('[0-9]{4} (Q[1-4])', '\\1', x$X)
    subset(x, year >= 2004 & year <= 2013 & quarter == 'Q1')