Search code examples
rrowdelete-row

R: Deleting Row doesn't shift data up for certain .csv but does for .xls file


I was interested in looking at GDP of a few states over a span of 4 years. After I imported the .csv file, I renamed the column names and then removed irrelevant rows. The result is that the data skips the 10th row when numbered. So it goes from 1 to 9, then starts at 11.

When I tried this with a similar dataframe I imported from a .xls file, the data does not skip the 10th row when numbered.

gdp<-read.csv("GDP_per.csv",skip = 4)
gdp<-gdp%>%
  rename(
    "2014" = X2013.2014,
    "2015" = X2014.2015,
    "2016" = X2015.2016,
    "2017" = X2016.2017,
    "2018" = X2017.2018
  )
gdp<-gdp[c(-(10),-(53:64)),]


gdp2<-read_excel("GDP_dol.xls", skip = 5)
gdp2<-gdp2[,c(2,20:24)]
gdp2<-gdp2[c(-(10),-(53:64)),]

9 Delaware 10.7 5.5 -0.7 2.5 3.9

11 Florida 4.9 6.5 5.0 4.4 5.8

vs.

9 Delaware 67178.9 70896.2 70379.8 72167.2 74973.3

10 Florida 839706.0 894044.0 938370.3 979464.6 1036323.2


Solution

  • The read.csv function returns a data.frame while read_excel returns a tibble. They are not the same and do not necessarily behave the same way. A data frame retains the original row names until you change them, e.g.

    (x <- data.frame(V1=1:10, V2=11:20))
    (x2 <- x[-5, ])                # Row name 5 is missing
    rownames(x2) <- NULL
    x2                             # Row names 1 - 9
    

    A tibble automatically renumber the rows:

    library(tidyr)
    xt <- tibble(x)
    (xt[-5, ])